1

My question has been asked and answered repeatedly here in Stackoverflow but none of the solutions seem to work for me. Could you please help? Here is a sample of my data using dput. dput(sample_n(CA_Complication, 20))

structure(list(ID = c("101", "101", "101", "101", "101", "101", 
"101", "101", "101", "101", "101", "101", "101", "101", "101", 
"101", "101", "101", "101", "101"), AD = c("447", "243", "608", 
"537", "1588", "302", "28", "1035", "755", "8", "1262", "95", 
"1954", "231", "88", "1898", "1028", "669", "103", "773"), Date1 = 
structure(c(1367802000,1324515600, 1404781200, 1391043600, 1621299600, 1340067600, 1291078800,1489453200, 1456966800, 1278986400, 1544058000, 1295398800, 1637629200,1321318800, 1293411600, 1623286800, 1485392400, 1417050000, 1296522000,1431997200), tzone = "UTC", class = c("POSIXct", "POSIXt")), Date2 = c("41400.041666666701", "40913.041666666701",  "41828.041666666701", "41697.041666666701", "44334.041666666701",         "41080.041666666701", "40512.041666666701", "42901.041666666701",         "42443.041666666701", "40372.041666666701", "43535.041666666701",         "40197.041666666701", "44523.041666666701", "40862.041666666701",         "40539.041666666701", "44357.041666666701", "42817.041666666701",         "42016.041666666701", "40575.041666666701", "42143.041666666701")), class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -20L))

       

As you see, There are two date columns which look similar in excel (the cell format for both is date: *14-03-2012. However, when I open the file in R, Date1 is of type Dttm while Date2 is character. I can get my desired date format for Date 1 using the code below:

library("anytime")  
anydate(mydata$Date1)     #yeilds date as yyyy-mm-dd,hh:mm:ss
mydata$Date1 <- substring(mydata$Date1,1,10)     # Remove hh:mm:ss
mydata$Date1 <- format(as.Date(mydata$Date1,'%Y-%m-%d'),'%d-%m-%Y')
class(mydata$Date1)       #character

But I can't get the second Date column in my desired format. If I use the code above it introduces many NAs. I also used this code openxlsx::convertToDateTime(mydata$Date2) leading to the following error: NAs introduced by coercion. indeed NA is introduced only for values 16 and 20 in my Date2 column. It is because these dates are 1809/01/01.

Then I tried this:

library(tibble)
library(janitor)
excel_numeric_to_date(as.numeric(as.character(mydata$Date2), date_system = "modern"))

which led to this warning message:

In excel_numeric_to_date(as.numeric(as.character(mydata$Date2),  :
NAs introduced by coercion

I also tried this:

library("datetimeutils")
convert_date(mydata$Date2, type = "Excel")
Error in charToDate(x) : 
character string is not in a standard unambiguous format

Does anybody know how to deal with such old dates in excel?

Update: I managed to get my code work by saving my data into an Access file and then I imported from Access into R. But still I would love to learn how to deal with it in excel, too.

Rara
  • 105
  • 9
  • 1
    The key is to do a conversion to numeric first, e.g. `as.Date(as.numeric(df$Date2), origin = "1899-12-30")` – Andre Wildberg Jan 13 '23 at 15:01
  • @AndreWildberg Here is the warning that I get: NAs introduced by coercion. I just realized NAs are introduced only when the date equals 09-09-1809. In this dataset it seems that if the date is unknown 09-09-1809 is used and unfortunately I can't change it. – Rara Jan 13 '23 at 15:09
  • Can you include one of the failing lines into your example? – Andre Wildberg Jan 13 '23 at 15:13
  • @AndreWildberg I updated the dput from 10 to 20. In Date2 the 16th and 20th value equal 09-09-1809. But to me it is not clear from the numbers presented here. I hope it helps. – Rara Jan 13 '23 at 15:28
  • Works fine for me on the data with the code I posted above. – Andre Wildberg Jan 13 '23 at 15:32
  • @AndreWildberg For me it works for all dates except 09-09-1809. – Rara Jan 13 '23 at 15:42
  • Checking the cells 16 - 20 of Date2 (`df$Date2[16:20]`), there's no difference and no reason why they should fail. Neither the format (character) nor the range makes them fundamentally different. – Andre Wildberg Jan 13 '23 at 15:46

1 Answers1

1

you have a problem of brackets and an as.character to much here. Does the following work? excel_numeric_to_date(as.numeric(mydata$Date2), date_system = "modern")

I don't get errors with: as.POSIXct(as.numeric(t$Date2),origin = '1970-01-01 00:00.00 UTC') but I don't know the correct origin.

Beni
  • 191
  • 9
  • It works perfectly for all values in Date2 except for some rows where the date equals 09-09-1809. It seems that this date is used to refer to unknown dates but I have no idea how to fix it. I have updated my dput to include these rows (values 16 and 20 in Date2). – Rara Jan 13 '23 at 15:37