1

I am reading the scv file of ~50Mb with read_feather of feather package.

While reading the error is generated as follows:

Error in .Call("feather_coldataFeather", PACKAGE = "feather", feather,  : 
negative length vectors are not allowed

I have not found the discription of this error. I used to read another while and such an error had not been found. I am a little bit stumbled with such an error.

Thanks in advance for your hints.

ayhan
  • 70,170
  • 20
  • 182
  • 203
Dimon D.
  • 438
  • 5
  • 23
  • My guess is that if the call is working for other `scv` files, but not this one, that this file may have some problem. Could you check the file manually for correctness? – Tim Biegeleisen Sep 16 '16 at 11:24
  • 1
    see https://stat.ethz.ch/pipermail/r-help/2015-January/425051.html probably your vector is actually too long. also [this](http://stackoverflow.com/questions/36842263/memory-limits-in-data-table-negative-length-vectors-are-not-allowed) is probably related – Cath Sep 16 '16 at 11:24
  • @Cath thanks for the hint. but I am not sure it hit the limit. The actual table is about 82k x 151. So I am re-downloading the new one. As far as I remember I used to have 1,4 mio rows and 35 columns and reading was Ok. – Dimon D. Sep 16 '16 at 11:47
  • 2
    To see if it's a corrupt file (Tim's suggestion) or something R can't deal with (Cath's suggestion) can you try reading it with Python? `pip install feather` at the cmdline then `import feather` and `feather.read_dataframe(path)` in python code? – hrbrmstr Sep 16 '16 at 11:48
  • @hrbrmstr Thanks for the additional hint :-) well, I am going to try a new data set to test. If the performance is poor I will try python. However, I would like to solve the issue in R environment. – Dimon D. Sep 16 '16 at 11:51
  • 1
    Unfortunately R & Python are the only two non-Java environments that I know of that can help validate a Feather file and it's possible there's a bug in the R package (there was a bug in Feather itself dealing with huge files, but that's not your issue here). – hrbrmstr Sep 16 '16 at 11:55
  • 1
    @hrbrmstr I have recompiled the dataset - removed all NA, made sure that numerics are numerics (no non-numeric values). And feather is ok – Dimon D. Sep 20 '16 at 09:47
  • Nice. Glad it's sorted out, but I'm starting to wonder if R/Python + Feather is worth it vs R/Python + Spark + Parquet. The latter is _alot_ of extra deps, but at least parquet seems to be a more stable format (I'm going through this selection process for work which is one of the reasons I'm bringing it up). – hrbrmstr Sep 20 '16 at 10:17
  • @hrbrmstr Thanks for the hint about parquet format. I will take it into account but now I am stuck to good old csv format. I wondered about feather in terms of performance tests I performed ([read/write from/to file](https://rpubs.com/demydd)) – Dimon D. Sep 20 '16 at 12:27

0 Answers0