0

I have a dataframe column that contains a mixture of date formats, for example 30/06/2020,07/2020 and 2020. I would like to convert the four digit numbers into a date (e.g. 2020 -> XX/XX/2020). I have different years, not just 2020, so I would prefer, if possible, a generic expression.

A supplementary question: when I read the data from an excel file, I get five-digit numbers instead of dates. From what I have read, these numbers are the days passed since 1900. Hence, the actual column involves also five-digit numbers, the four-digit numbers that represent the year, and the other days. I have dealed with that issue, but not in an optimal way. Is there a generic way to deal all these formats together? Sorry for the large post

K

  • I am not aware of a function able to parse different date formats into the same column. (Try looking at `lubridate`, it has a ton a functions for dealing with dates). However, if you know thtat the last four digits are representing the year, you can extract the information withouting parsing it to date. See `stringr::str_sub`. For your last question, you can use `janitor::excel_numeric_to_date()` – Celso Silva Mar 03 '21 at 16:06
  • 1
    Please share some sample input (in copy/pasteable valid R syntax) and desired output. If the input is `2020`, do you want a character string with `"XX/XX/2020"`? Or do you want to, say, default to Jan 1 if no month/year is found? Or something else? Your title says "four digit numbers", so do you only need the year? – Gregor Thomas Mar 03 '21 at 16:39
  • This https://stackoverflow.com/a/52319606/3358272 might be useful (specifically the 3rd code block that starts with `formats <- c(...)`), where the `formats` provides candidate formats in order of precedence. – r2evans Mar 03 '21 at 16:47
  • 1
    It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Mar 04 '21 at 01:32

1 Answers1

0

Thank you all for your ideas. You are right, I need to be more specific next time. I focused on solving the problem to be honest I believe I did it.

Regarding the data, a simple illustration might be the following:

date
08/2003
12/06/2002
38054
2004
...
...
...

First, I found which elements of the dataframe column (RHO_DataBase$date) are expressed as a year (e.g. 2003) and convert them to date (e.g. 15/05/2003):

#Step 1
counter1 <- which( (!is.na(as.numeric(RHO_DataBase$date))) & (as.numeric(RHO_DataBase$date)<2030)  )
for (i in counter1) {
  RHO_DataBase$date[i] <- paste ("15/05/",sep="",RHO_DataBase$date[i])
}

Then, I found which elements are expressed in numeric values (days since 30/12/1899), and convert their format to day/month/year

#Step 2
counter2 <- which(!is.na(as.numeric(RHO_DataBase$date)))
for (i in counter2) {
  RHO_DataBase$date[i] <- format(as.Date(as.numeric(RHO_DataBase$date[i]), origin = "1899-12-30"),'%d/%m/%Y')
}

Then, I found the elements of the column that are expressed in the other remaining format, in this case only month/year, and change it to the day/month/year using paste.

# Step 3:
counter3<-which(is.na(as.Date( RHO_DataBase$date, "%d/%m/%Y") ) )
for (i in counter3) {
  RHO_DataBase$date[i] <- paste ("01/",sep="",RHO_DataBase$date[i])
} 

Cheers, K