2

Looking for some help with an error I have not seen before and can't seem to find any help around it here.

I am trying to join two datasets and then keep the non-duplicated entries. However, when I run the code below:

alllawsuits <- allfjccases %>% 
inner_join(.,allprisonsnos) %>%
distinct(CASENAME,PLT,DEF,FILEDATE,TERMDATE,NOSedit,Docket,completename,.keep_all=T)

I receive the following error:

Joining, by = c("Docket", "NOSedit", "File.Year", "File.Month", "File.Day")
Error: Internal error in ``date_validate()``: Corrupt ``Date`` with unknown type character.

The Date variable is not one of the merging variables and the code still does not run even if I exclude it from the two data frames. All the column types are the same across the two datasets and I have absolutely no idea how to fix this. Thoughts?

A couple other things: the code used to run fine in the past and merge works fine, just not any of the join functions from dplyr.

Werner Hertzog
  • 2,002
  • 3
  • 24
  • 36
  • Please can you post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your datasets? – stlba Sep 30 '20 at 18:02

1 Answers1

2

Your error is referencing some unspecified tbl_df variable that has with Date class but contains character data - this may be, but is not necessarily, the variable named Date (that seems to be how you are interpreting the error, and that is a reasonable but incorrect interpretation of the error).

This error is being thrown by date_validate(), an internal function in newer versions of the vctrs package, which is a dependency of dplyr and tbl_df. vctrs is more picky about what constitutes a valid Date object than most operations in base R. It doesn't matter whether the Date class variable is part of the join key because the column validation is performed when the new tbl_df object is created as a result of the join.

Normally a Date object is a numeric vector with the attribute class = "Date". For some reason there is a special S3 method specifically for preventing Date objects from returning e.g. is.numeric(my_date_object) = TRUE (look at the definition of base::is.numeric.Date() - this method supersedes dispatch to the primitive is.numeric function for Date objects) but they ARE normally numeric values "under the hood", so to speak. If we strip off the class attribute we can verify this.


> test <- as.Date(c("2020-01-01", "2020-01-02"))
> test
[1] "2020-01-01" "2020-01-02"
> str(test)
 Date[1:2], format: "2020-01-01" "2020-01-02"
> is.character(test)
[1] FALSE
> is.numeric(test)
[1] FALSE
> is.numeric(unclass(test))
[1] TRUE

However, it is also possible to create a Date object by explicitly assigning the Date class to a character vector where all the individual elements are coercible to numeric. The resulting Date object prints as if it were a normal Date object but it is still a character vector.:


> test <- structure(c("21424", "21425"), class = "Date")
> test
[1] "2028-08-28" "2028-08-29"
> str(test)
 Date[1:2], format: "2028-08-28" "2028-08-29"
> is.character(test)
 TRUE
> is.numeric(test)
 FALSE
> is.numeric(unclass(test))
[1] FALSE
> tibble(a = test)
  Error: Internal error in `date_validate()`: Corrupt `Date` with unknown type character.

And there's your error. Somewhere upstream of the code that you have shown, you performed some operation that created a nonstandard Date column. Either don't do that, or coerce the offending column to a normal date with something like

blah blah blah... %>%
  mutate(my_column = as.Date(as.numeric(my_column), origin = "1970-01-01")) %>%
  left_join(blah blah ...

A lot of base R methods don't care about this because they perform implicit conversion before anything gets passed to a compiled function. vctrs, and dplyr verbs and joins by extension, do care about this. They achieve better performance by skipping implicit conversion for many operations. But as a tradeoff they have to be pickier about object types.

bcarlsen
  • 1,381
  • 1
  • 5
  • 11
  • +1. This solved the issue - I didn't think an oddly formatted date would prevent a merge, but it did. Thanks! – mollybreens Sep 30 '20 at 19:28
  • To be clear, the issue is not related to formatting. `format` in the context of R dates refers to how a date is printed to the console. The problem with your dates is that they have an unusual data type. – bcarlsen Oct 01 '20 at 15:00