1

The problem posed by the user @spore234 appears to be solved: sas7bdat date format to R date format

I, too, uploaded a .sas7bdat file, and I got numbers instead of dates. Unlike @spore234, the use of as.Date(..., origin = "1970-01-01") is not correct but using `as.POSIXct(..., origin = "1970-01-01") gives the right date but not the right time. It is the reverse of the findings from 5 years ago.

To avoid confusion, I opened up a new issue.

dir <- tmpdir()

#The dates in SAS are 
# SUBMITTED_AT                RUN_DATE
# 06APR2021:16:17:02          11DEC2020:05:00:00
# 06APR2021:16:17:02          11DEC2020:05:00:00

wide <- sas7bdat::read.sas7bdat(paste0(dir, "\\sas7bdat_issue.sas7bdat"))
print(wide)
#>   SUBMITTED_AT   RUN_DATE
#> 1   1933345022 1923282000
#> 2   1933345022 1923282000

as.Date(wide$SUBMITTED_AT, origin = "1970-01-01")
#> [1] "5295288-11-18" "5295288-11-18"

# gives the wrong year, which is not correct. Using `as.POSIXct()` with no modification gives the right day but not the right time. 

as.POSIXct(wide$SUBMITTED_AT,origin='1960-01-01')
#> [1] "2021-04-06 12:17:02 EDT" "2021-04-06 12:17:02 EDT"

It's an easy fix, as we can format not to show the time. However, could some numbers get lost in the SAS--R conversion?

as.POSIXct(wide$RUN_DATE, format = "%Y-%m-%d", origin='1960-01-01')
#> [1] "2020-12-11 EST" "2020-12-11 EST"

xfun::session_info("sas7bdat")
#> R version 4.0.4 (2021-02-15)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Locale:
#>   LC_COLLATE=English_United States.1252 
#>   LC_CTYPE=English_United States.1252   
#>   LC_MONETARY=English_United States.1252
#>   LC_NUMERIC=C                          
#>   LC_TIME=English_United States.1252    
#> 
#> Package version:
#>   sas7bdat_0.5

Created on 2021-05-11 by the reprex package (v1.0.0)

SAS (r) Proprietary Software 9.4 (TS1M4)

Huh? These findings are bizarre.

phargart
  • 655
  • 3
  • 14
  • FYI `haven` is an additional SAS to R package https://haven.tidyverse.org/ – M.Viking May 11 '21 at 23:45
  • Your SAS dataset does not have DATE values (number of days). It has DATETIME values (number of seconds). It would make sense that you need to use different conversion logic to handle different types of numbers. – Tom May 12 '21 at 13:20
  • @Tom, so it is the opposite of the problem for spore234? Here is the answer that was given to spore234's question: "POSIXct stores dates as the number of seconds since the origin. The SAS dates are the number of days since the origin." How could SAS would switch like that from DATE to DATETIME values? https://stackoverflow.com/questions/32564703/sas7bdat-date-format-to-r-date-format – phargart May 12 '21 at 13:32
  • Note sure that you mean about "switch". SAS has two types of variables. Fixed length character strings and floating point numbers. By convention they have created formats and informats that convert number of days to/from strings that look to humans like dates. And others that do the same for number of seconds. You could reverse the question and ask why other systems require you to specify the number of seconds after midnight when you are just recording dates. – Tom May 12 '21 at 16:36
  • @Tom, it is just that SAS stored dates the opposite way before in this post https://stackoverflow.com/questions/32564703/sas7bdat-date-format-to-r-date-format -- so applying the solutions listed here didn't work for me. They suggested that SAS uses DTE values and not DATETIME values. You are now suggesting the opposite in that SAS uses DATETIME values and not DATE values. – phargart May 12 '21 at 17:13
  • 1
    That link is to a question about dates. This question is not about dates, it is about datetime values. Two different concepts. Dates do not have a time of day component to them. SAS uses two different numbering systems for those two different concepts. So it makes sense you would need two different R function calls to convert the different types of numbers. – Tom May 12 '21 at 17:17

1 Answers1

3

The timezone offset is causing the issue I think. Try

as.POSIXct(wide$SUBMITTED_AT, origin="1960-01-01", tz="UTC")

The EDT timezone is 4 hours out from UTC, but they are actually the exact same point in time, and this is reflected in terms of how they are stored in R. The only difference is how the times are displayed:

as.POSIXct(wide$SUBMITTED_AT, origin="1960-01-01", tz="America/New_York")
#[1] "2021-04-06 12:17:02 EDT" "2021-04-06 12:17:02 EDT"
as.numeric(as.POSIXct(wide$SUBMITTED_AT, origin="1960-01-01", tz="America/New_York"))
#[1] 1617725822 1617725822

as.POSIXct(wide$SUBMITTED_AT, origin="1960-01-01", tz="UTC")
#[1] "2021-04-06 16:17:02 UTC" "2021-04-06 16:17:02 UTC"
as.numeric(as.POSIXct(wide$SUBMITTED_AT, origin="1960-01-01", tz="UTC"))
#[1] 1617725822 1617725822
thelatemail
  • 91,185
  • 12
  • 128
  • 188