How to work with the %YM%m (2022M01) date format in R

Working with data from different institutional or governmental sources, I sometimes have data in weird formats where I need to turn it into something a programming language can process. NHSN dates in particular make me want to scream. If you work with NHSN (the CDC’s National Healthcare Safety Network) and you export data by month, you’re probably familiar with the date format 2022M01. (Why, CDC, why?!) Now you may be wondering, “Why dedicate an entire post to this weird, obscure format?” When I was trying to find some answers searching the ole Google-machine, I found little to nothing to go off of. I tested quite a few strategies out before finding something that works. And hopefully this will save someone else a future headache.

Convert %YM%m to a standard date format

First, let’s break this format apart. The first 4 digits are the year, the M stands for month, and the last two digits are the month number (i.e. January = 01). As you can imagine, thinking of this as a date will break R’s brain, so it reads it as a character variable instead. Since there isn’t a “day” number, we will have to finesse it a little bit and then re-format it into a usable date. We can use the paste0 function to tack on an “01” at the end (or whatever day of the month you need) and set the format to “%YM%m%d”. If you don’t do this, R will read your month number as the day number and you’ll get wonky dates in your output.

new_data <- data_table %>% 
mutate(
date = as.Date(paste0(as.character(date_var), '01'), "%YM%m%d"), 
new_var = as.Date(date, "new_date_format"))

*Blue text indicates info you’ll want to swap out for your own

Example:

#libraries
library(tidyverse)
#using May 2022 as an example
mydata <- tibble(summaryym = "2022M05")

new_dates <- mydata %>% mutate(date = as.Date(paste0(as.character(summaryym), '01'), "%YM%m%d"), 
       new_summaryym = as.Date(date, "%Y-%m-%d"))
#original summaryym = 2022M05
#new_summaryym = 2022-05-01

Convert a standard date format to %YM%m

To do the reverse, we’ll need to use the strptime function to strip apart the date and specify which parts make up the month & year. You may be thinking, “WHY why why would you want to murk up a perfectly good date into this god awful format?” In my day job, we have a pre-existing process that will break if the dates are not in this format. So in order for that process to work, it is necessary to convert it back.

data_table %>% mutate(date = strptime(as.character(date_var), "date_var_format", 
       new_var = format(date, "%YM%m"))

date_var = your original date variable

Example:

#librarieslibrary(tidyverse)

#using May 2022 as an example
mydata <- tibble(summaryym = as.Date("2022-05-01"))
                      
new_dates <- mydata %>% mutate(date = strptime(as.character(summaryym), "%Y-%m-%d"), 
       new_summaryym = format(date, "%YM%m"))

#original summaryym = 2022-01-01
#new_summaryym = 2022M01

I’ve tried a few different ways to convert 2022M01 to and from a standard date format, but these two seem to work most consistently. A simple format() or as.Date() returns NA. And using strptime in the first conversion (from 2022M01 to standard) was also a no-go for me. I would love to hear strategies from others who have used this format before…please comment below if you have one to share!

Also let us know of any other weird date formats you’ve come across in the wild! Bonus points for random non-numeric characters.

Leave a Comment

Your email address will not be published. Required fields are marked *