1

This might be a beginners question and have quite a simple fix but I've been at it for a while and cant seem to figure it out. I have highfrequency data that has about 500,000 rows and 62 columns. I want to use fread() to make the reading more efficient but the problem is that not all rows are of the same length. Theres quote data and theres trade data, the trade lines have only 5 columns.

Here is my output when I read using read.csv:

> df<- read.csv(file = "AUROPHARMA15OCTFUT_20150916_ob.csv",header = FALSE,sep = "", col.names = c("DateTime","Seq","BP1","BQ1","BO1","AP1","AQ1","AO1","BP2","BQ2","BO2","AP2","AQ2","AO2","BP3","BQ3","BO3","AP3","AQ3","AO3","BP4","BQ4","BO4","AP4","AQ4","AO4","BP5","BQ5","BO5","AP5","AQ5","AO5","BP6","BQ6","BO6","AP6","AQ6","AO6","BP7","BQ7","BO7","AP7","AQ7","AO7","BP8","BQ8","BO8","AP8","AQ8","AO8","BP9","BQ9","BO9","AP9","AQ9","AO9","BP10","BQ10","BO10","AP10","AQ10","AO10"), colClasses = c(NA, rep("integer",31), rep("NULL", 30)))    
> head(df,100)
            DateTime    Seq   BP1 BQ1 BO1   AP1 AQ1 AO1   BP2 BQ2 BO2   AP2 AQ2 AO2   BP3 BQ3 BO3   AP3 AQ3 AO3   BP4 BQ4 BO4   AP4 AQ4 AO4   BP5 BQ5 BO5   AP5 AQ5 AO5
    1   1.442375e+18   9905 70730   1   1 72830   1   1     0   0   0     0   0   0     0   0   0     0   0   0     0   0   0     0   0   0     0   0   0     0   0   0
    2   1.442375e+18   9940 70735   1   1 72830   1   1 70730   1   1     0   0   0 70285   1   1     0   0   0 70185   1   1     0   0   0     0   0   0     0   0   0
    3   1.442375e+18  10427 70905   1   1 72830   1   1 70735   3   1     0   0   0 70730   1   1     0   0   0     0   0   0     0   0   0     0   0   0     0   0   0
    4   1.442375e+18  10452 70910   1   1 72830   1   1 70905   1   1     0   0   0 70735   2   1     0   0   0 70730   1   1     0   0   0     0   0   0     0   0   0
    5   1.442375e+18  31223 70905   1   1 71805   1   1 70900   1   1 71810   1   1 70730   1   1 71815   1   1 70230   1   1 71925   3   1 70220   1   1 72075   4   1
    6   1.442375e+18  68871 70910   1   1 71405   1   1 70905   2   1 71560   4   1 70900   1   1 71625   1   1 70795   1   1 71790   2   1 70730   1   1 71800   1   1
    7   1.442375e+18  69041 70905   2   1 71405   1   1 70900   1   1 71560   4   1 70795   1   1 71625   1   1 70730   1   1 71790   2   1 70220   1   1 71800   1   1
    8   1.442375e+18  70214 70910   1   1 71405   1   1 70905   2   1 71560   4   1 70900   1   1 71625   1   1 70795   1   1 71715   2   1 70730   1   1 71790   2   1
    9   1.442375e+18  73954 70915   1   1 71405   1   1 70910   1   1 71560   4   1 70905   1   1 71625   1   1 70900   1   1 71715   2   1 70795   1   1 71800   1   1
    10  1.442375e+18  82309 71000   1   1 71405   1   1 70915   1   1 71560   5   1 70905   1   1 71625   1   1 70785   1   1 71715   2   1 70730   1   1 71780   2   1
    11  1.442375e+18 208143 71005   1   1 71500   1   1 71000   2   1 71700   4   1 70905   1   1 71705   4   1 70795   1   1 71800   1   1 70730   1   1 71840   2   1
    12  1.442375e+18 208162 71010   1   1 71500   1   1 71005   1   1 71700   4   1 71000   2   1 71705   4   1 70905   1   1 71800   1   1 70795   1   1 71840   2   1
    13  1.442375e+18 208216 71015   1   1 71500   1   1 71010   1   1 71700   4   1 71000   2   1 71705   4   1 70905   1   1 71800   1   1 70795   1   1 71840   2   1
    14  1.442375e+18 220697 71000   1   1 71500   1   1 70905   1   1 71610   4   1 70795   1   1 71705   4   1 70760   2   1 71800   1   1 70730   1   1 71840   2   1
    15  1.442375e+18 221546 71005   1   1 71500   1   1 71000   2   1 71610   4   1 70905   1   1 71705   4   1 70795   1   1 71800   1   1 70760   2   1 71840   2   1
    16  1.442375e+18 242717 71000   1   1 71435   1   1 70905   1   1 71510   6   1 70785   1   1 71675   3   1 70735   2   1 71740   2   1 70730   1   1 71790   3   1
    17  1.442375e+18 258996 71020   4   1 71435   1   1 71000   1   1 71510   6   1 70905   1   1 71570   4   1 70785   1   1 71740   2   1 70735   2   1 71790   3   1
    18  1.442375e+18 410566 71025   1   1 71495   5   1 71020   4   1 71500   1   1 71000   1   1 71600   2   1 70905   1   1 71750   2   1 70840   1   1 71800   1   1
    19  1.442375e+18 436429 71020   4   1 71495   4   1 71000   1   1 71500   1   1 70905   1   1 71655   4   1 70800   2   1 71790   2   1 70750   1   1 71800   1   1
    20  1.442375e+18 539903 71025   1   1 71435   2   1 71020   4   1 71475   2   1 71000   1   1 71500   1   1 70905   1   1 71600   4   1 70900   1   1 71625   3   1
    21  1.442375e+18 572089 71100   1   1 71435   2   1 71025   1   1 71475   2   1 71020   4   1 71500   1   1 71000   1   1 71600   4   1 70905   1   1 71635   2   1
    22  1.442375e+18 588512 71105   1   1 71480   1   1 71100   1   1 71500   1   1 71025   1   1 71590   2   1 71020   4   1 71630   2   1 71000   1   1 71640   3   1
    23  1.442375e+18 588658 71110   1   1 71500   1   1 71105   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71640   3   1 71000   1   1 71700   4   1
    24  1.442375e+18 642054 71115   1   1 71500   1   1 71110   3   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71755   4   1 71000   1   1 71800   1   1
    25  1.442375e+18 645889 71120   1   1 71500   1   1 71115   1   1 71590   2   1 71110   2   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    26  1.442375e+18 645894 71125   1   1 71500   1   1 71120   2   1 71590   2   1 71110   1   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    27  1.442375e+18 645897 71130   1   1 71500   1   1 71125   1   1 71590   2   1 71120   1   1 71630   2   1 71110   1   1 71755   4   1 71100   1   1 71800   1   1
    28  1.442375e+18 645903 71135   1   1 71500   1   1 71130   2   1 71590   2   1 71110   1   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    29  1.442375e+18 645911 71140   1   1 71500   1   1 71135   1   1 71590   2   1 71130   1   1 71630   2   1 71110   1   1 71755   4   1 71100   1   1 71800   1   1
    30  1.442375e+18 645914 71145   1   1 71500   1   1 71140   2   1 71590   2   1 71110   1   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    31  1.442375e+18 645917 71150   1   1 71500   1   1 71145   1   1 71590   2   1 71140   1   1 71630   2   1 71110   1   1 71755   4   1 71100   1   1 71800   1   1
    32  1.442375e+18 645925 71155   1   1 71500   1   1 71150   2   1 71590   2   1 71110   1   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    33  1.442375e+18 645928 71160   1   1 71500   1   1 71155   1   1 71590   2   1 71150   1   1 71630   2   1 71110   1   1 71755   4   1 71100   1   1 71800   1   1
    34  1.442375e+18 645938 71165   1   1 71500   1   1 71160   2   1 71590   2   1 71110   1   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    35  1.442375e+18 645966 71170   1   1 71500   1   1 71165   1   1 71590   2   1 71160   1   1 71630   2   1 71110   1   1 71755   4   1 71100   1   1 71800   1   1
    36  1.442375e+18 646010 71175   1   1 71500   1   1 71170   2   1 71590   2   1 71110   1   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    37  1.442375e+18 646028 71180   1   1 71500   1   1 71175   1   1 71590   2   1 71170   1   1 71630   2   1 71110   1   1 71755   4   1 71100   1   1 71800   1   1
    38  1.442375e+18 646064 71185   1   1 71500   1   1 71180   2   1 71590   2   1 71110   1   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    39  1.442375e+18 646077 71190   1   1 71500   1   1 71185   1   1 71590   2   1 71180   1   1 71630   2   1 71110   1   1 71755   4   1 71100   1   1 71800   1   1
    40  1.442375e+18 646107 71195   1   1 71500   1   1 71190   2   1 71590   2   1 71110   1   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    41  1.442375e+18 646132 71200   1   1 71500   1   1 71195   1   1 71590   2   1 71190   1   1 71630   2   1 71110   1   1 71755   4   1 71100   1   1 71800   1   1
    42  1.442375e+18 649141 71205   1   1 71500   1   1 71200   3   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71755   4   1 71000   1   1 71800   1   1
    43  1.442375e+18 649160 71210   1   1 71500   1   1 71205   1   1 71590   2   1 71200   2   1 71630   2   1 71100   1   1 71755   4   1 71020   4   1 71800   1   1
    44  1.442375e+18 659724 71100   1   1 71500   1   1 71020   4   1 71590   2   1 71000   1   1 71630   2   1 70980   2   1 71800   1   1 70940   1   1 71820   2   1
    45  1.442375e+18 659737 71105   1   1 71500   1   1 71100   1   1 71590   2   1 71020   4   1 71630   2   1 71000   1   1 71800   1   1 70980   2   1 71820   2   1
    46  1.442375e+18 659778 71110   1   1 71500   1   1 71105   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    47  1.442375e+18 659796 71115   1   1 71500   1   1 71110   1   1 71590   2   1 71105   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    48  1.442375e+18 659814 71120   1   1 71500   1   1 71115   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    49  1.442375e+18 659832 71125   1   1 71500   1   1 71120   1   1 71590   2   1 71115   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    50  1.442375e+18 659855 71130   1   1 71500   1   1 71125   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    51  1.442375e+18 659868 71135   1   1 71500   1   1 71130   1   1 71590   2   1 71125   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    52  1.442375e+18 659882 71140   1   1 71500   1   1 71135   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    53  1.442375e+18 659910 71145   1   1 71500   1   1 71140   1   1 71590   2   1 71135   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    54  1.442375e+18 659938 71150   1   1 71500   1   1 71145   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    55  1.442375e+18 659956 71155   1   1 71500   1   1 71150   1   1 71590   2   1 71145   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    56  1.442375e+18 659967 71160   1   1 71500   1   1 71155   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    57  1.442375e+18 659978 71165   1   1 71500   1   1 71160   1   1 71590   2   1 71155   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    58  1.442375e+18 660009 71170   1   1 71500   1   1 71165   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    59  1.442375e+18 660017 71175   1   1 71500   1   1 71170   1   1 71590   2   1 71165   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    60  1.442375e+18 660036 71180   1   1 71500   1   1 71175   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    61  1.442375e+18 660059 71185   1   1 71500   1   1 71180   1   1 71590   2   1 71175   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    62  1.442375e+18 660075 71190   1   1 71500   1   1 71185   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    63  1.442375e+18 660096 71195   1   1 71500   1   1 71190   1   1 71590   2   1 71185   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    64  1.442375e+18 660119 71200   1   1 71500   1   1 71195   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    65  1.442375e+18 660130 71205   1   1 71500   1   1 71200   1   1 71590   2   1 71195   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    66  1.442375e+18 660174 71210   1   1 71500   1   1 71205   2   1 71590   2   1 71100   1   1 71630   2   1 71020   4   1 71800   1   1 71000   1   1 71820   2   1
    67  1.442375e+18 660204 71215   1   1 71500   1   1 71210   1   1 71590   2   1 71205   1   1 71630   2   1 71100   1   1 71800   1   1 71020   4   1 71820   2   1
    68  1.442375e+18 676325 71220   1   1 71500   1   1 71215   3   1 71660   2   1 71100   1   1 71700   2   1 71020   4   1 71800   1   1 71000   3   1 71825   2   1
    69  1.442375e+18 679352 71215   3   1 71500   1   1 71100   1   1 71660   2   1 71020   4   1 71700   2   1 71000   3   1 71800   1   1 70985   1   1 71825   2   1
    70  1.442375e+18 679855 71225   1   1 71500   1   1 71215   2   1 71660   2   1 71100   1   1 71700   2   1 71020   4   1 71800   1   1 71000   3   1 71825   2   1
    71  1.442375e+18 685588 71230   1   1 71500   1   1 71225   1   1 71660   2   1 71100   1   1 71700   2   1 71020   4   1 71800   1   1 71015   2   1 71835   2   1
    72  1.442375e+18 685600 71235   1   1 71500   1   1 71230   1   1 71660   2   1 71100   1   1 71700   2   1 71020   4   1 71800   1   1 71015   2   1 71835   2   1
    73  1.442375e+18 685609 71230   1   1 71500   1   1 71100   1   1 71660   2   1 71020   4   1 71700   2   1 71015   2   1 71800   1   1 70985   1   1 71835   2   1
    74  1.442375e+18 685611 71240   1   1 71500   1   1 71100   1   1 71660   2   1 71020   4   1 71700   2   1 71015   2   1 71800   1   1 70985   1   1 71835   2   1
    75  1.442375e+18 685635 71245   1   1 71500   1   1 71240   1   1 71660   2   1 71230   1   1 71700   2   1 71100   1   1 71800   1   1 71020   4   1 71835   2   1
    76  1.442375e+18 690309 71500   1   1    NA  NA  NA    NA  NA  NA    NA  NA  NA    NA  NA  NA    NA  NA  NA    NA  NA  NA    NA  NA  NA    NA  NA  NA    NA  NA  NA
    77  1.442375e+18 690309 71245   1   1 71660   2   1 71100   1   1 71700   2   1 71020   4   1 71800   1   1 71015   2   1 71835   2   1 70985   1   1 72000   1   1
    78  1.442375e+18 690344 71250   1   1 71660   2   1 71245   1   1 71700   2   1 71100   1   1 71800   1   1 71020   4   1 71835   2   1 71015   2   1 72000   1   1
    79  1.442375e+18 691085 71255   1   1 71655   1   1 71250   3   1 71660   2   1 71100   1   1 71700   2   1 71020   4   1 71800   1   1 71015   2   1 71835   2   1

As you can see row 76 is a trade line and has data only in 5 columns. read.csv handles the missing values correctly and inserts NAs for them, this is what I want fread to do if it is possible.

But when I run fread, this is what I get:

> fdf<- fread(input = "AUROPHARMA15OCTFUT_20150916_ob.csv", sep = " ", header = FALSE, na.strings = " ", data.table = FALSE)
Error in fread(input = "AUROPHARMA15OCTFUT_20150916_ob.csv", sep = " ",  : 
  Expected sep (' ') but new line or EOF ends field 5 on line 76 when reading data: 1442375160144000000 690309 71500 1 1

Even when I specify the colClasses:

> fdf<- fread(input = "AUROPHARMA15OCTFUT_20150916_ob.csv", sep = " ", header = FALSE, na.strings = "", data.table = FALSE, colClasses = c(NA, rep("integer",31), rep("NULL", 30)))
Error in fread(input = "AUROPHARMA15OCTFUT_20150916_ob.csv", sep = " ",  : 
  Expected sep (' ') but new line or EOF ends field 5 on line 76 when reading data: 1442375160144000000 690309 71500 1 1

Do I have to change my na.strings argument? If so, what should it be? Or is there some other way I should be tackling this issue? Any help, suggestions or alternatives will be appreciated. Thanks for the help.

Let me know if any more info is needed.

UtdMan
  • 151
  • 2
  • 12

1 Answers1

2

you can use na.strings = c(" ","\n") in your fread(), but it won't work in data.table version 1.9.6 . It is a known issue:

https://github.com/Rdatatable/data.table/issues/810

This appears to be fixed in data.table version 1.9.7 (not available in CRAN though). Update your data.table version from github and run the code with the above parameter changes.

Thanks to drat repositories updating data.table to devel version is now much easier, see Installation wiki for details.

jangorecki
  • 16,384
  • 4
  • 79
  • 160
vdep
  • 3,541
  • 4
  • 28
  • 54
  • I tried this but Im still getting the same error. And, yes I have data.table version 1.9.7 which I downloaded from github – UtdMan Mar 11 '16 at 03:45
  • yes, you can use other alternatives to `fread()` till 1.9.7 becomes stable. Refer @mnel answer to http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r – vdep Mar 11 '16 at 06:42
  • Yes, Ive seen that question before. I chose to do fread since its definitely the fastest. Ill have a look at the others. Any idea when 1.9.7 will be stable? – UtdMan Mar 11 '16 at 08:48
  • No. may be @jangorecki can answer it. – vdep Mar 11 '16 at 09:03