Error: corrupt data frame when using dplyr::bind_rows after updating readr form 0.1.1 to 0.2.0

Question

I just updated the readr package from version 0.1.1 to 0.2.0 but now an operation that worked before throws an error.

Before updating I did this using the readr package:

file.list <- list.files(<path>, pattern='*.csv')

df.list <- lapply(file.list, read_csv2)
df.list <- lapply(df.list, function(x) x[-1,])

The last step is necessary because I have some long headers with special characters that somehow cause an extra line to be read. This is another issue, but simply deleting the first row had worked till then.

read_csv2 warns me about the issue with the column names but, as said, this I fixed by just deleting the row:

Warning: 1 parsing failure.
  row col    expected      actual
   1  -- 227 columns 222 columns

I then went on to bind all data frames into one using dplyr::bind_rows (as each .csv has identical headers). This worked perfectly before but now when I do this I get

> full.data <- bind_rows(df.list)
Error: corrupt data frame

I have not changed anything else (same R version, same RStudio version, no other package was updated). Anyone experienced anything similar? Has any dramtic change been made compared to version 0.1.1 in the way read_csv2 works.

Thanks

Perhaps you can use `lapply(df.list, problems)` to get more info? — Axeman, Oct 21 '15 at 10:11
Could you try `rbindlist` from `data.table`? ([see here for an example](http://stackoverflow.com/questions/32888757/reading-multiple-files-into-r-best-practice/32888918#32888918)) If that works, then it's probably a bug. If that also doesn't work, then the cause is highly probably in your data ... — Jaap, Oct 21 '15 at 14:15
`rbindlist` works perfectly. any idea where the bug might hide? — Manuel R, Oct 21 '15 at 17:25

score 6 · Accepted Answer · answered Oct 23 '15 at 10:03

Apparently the reason for my issue is that since readr version 0.2.0 empty cells in the orginial .csv file get automatically converted to NA. While this is probably what you want 99% of the time if all your headers are in fact non-missing, it is rather tedious when one of your column headers is empty. In fact, my original files did contain empty headers (as these files where really not what you would call "tidy").

So after reading my .csv files via lapply(file.list, read_csv2) I had at least one column per data.frame that had NA as column header which bind_rows(df.list) really didnt like. That's probably reasonable as NA should never be in a column header anyway. However, as noted here, I think readr should have some options to adress the existence of empty column headers (or at least throw a meaningful warning), especially since it is also the reason for another error as I mentioned here.

Error: corrupt data frame when using dplyr::bind_rows after updating readr form 0.1.1 to 0.2.0

1 Answers1