I am importing data from SPSS (sav file). In this data set there are several date columns, several of which contain cells with missing values. All date columns are imported as date variables into R, so far so good. All date entries from SPSS are imported correctly into R, still so far so good. However, some of the missing values in SPSS are interpreted as 1582-10-14 when imported to R. I understand this is the start of the Gregorian calendar, so I thought maybe some of the cells in SPSS are 0 instead of NA, but when checking said cells, they are indeed empty, and look exactly the same as other empty cells which are correctly imported as NA in R.
I am sorry I can't think of a reproducible example and I can't put my SPSS file here, integrityreasons.
I have tried importing using rio (import_list
) and haven(read_sav
), and the same cells are interpreted as 1582-10-14 when using both methods. Maybe it is not random and there is some hidden data in the SPSS file not visible in the SPSS interface?
Any idea why I get this error? Could it be a bug? My experience though is that it seldom is. Has anyone else experienced this?
A possible fix would be just to remove all 1582-10-14 entries in R but I would like know the reason why it occurred. What if more data imported from SPSS is incorrect?
Edit: This post has been suggested as a solution, which it is not. That post concerns R importing dates from SPSS as absolute numbers in seconds from 1582-10-14 and how to convert these values to actual dates, if I understand correctly. This post concerns R incorrectly interpreting some of the missing values of an SPSS date column as 1582-10-14.
Here is a screenshot of how the data looks in SPSS and a dput()
of how it looks after importing to R.
datetest <- structure(list(datevariable = structure(c(NA, 18394,
NA, 17914, -141428, -141428, NA, -141428, NA, NA, NA,
-141428, NA, NA, -141428, -141428, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 17765, NA, NA,
NA, NA, NA, NA, NA, -141428, NA, NA, NA, NA, NA, NA, NA,
19278, NA, NA, NA, NA, NA, NA, NA, 19121, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), label = "datevariable", class = "Date", format.spss = "SDATE10")),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -77L))
As you can see row numbers 2 and 4 are imported correctly. The missing values of row 5,6,8,12,15... are imported as 1582-10-14.