This might be a beginners question and have quite a simple fix but I've been at it for a while and cant seem to figure it out. I have highfrequency data that has about 500,000 rows and 62 columns. I want to use fread() to make the reading more efficient but the problem is that not all rows are of the same length. Theres quote data and theres trade data, the trade lines have only 5 columns.
Here is my output when I read using read.csv:
> df<- read.csv(file = "AUROPHARMA15OCTFUT_20150916_ob.csv",header = FALSE,sep = "", col.names = c("DateTime","Seq","BP1","BQ1","BO1","AP1","AQ1","AO1","BP2","BQ2","BO2","AP2","AQ2","AO2","BP3","BQ3","BO3","AP3","AQ3","AO3","BP4","BQ4","BO4","AP4","AQ4","AO4","BP5","BQ5","BO5","AP5","AQ5","AO5","BP6","BQ6","BO6","AP6","AQ6","AO6","BP7","BQ7","BO7","AP7","AQ7","AO7","BP8","BQ8","BO8","AP8","AQ8","AO8","BP9","BQ9","BO9","AP9","AQ9","AO9","BP10","BQ10","BO10","AP10","AQ10","AO10"), colClasses = c(NA, rep("integer",31), rep("NULL", 30)))
> head(df,100)
DateTime Seq BP1 BQ1 BO1 AP1 AQ1 AO1 BP2 BQ2 BO2 AP2 AQ2 AO2 BP3 BQ3 BO3 AP3 AQ3 AO3 BP4 BQ4 BO4 AP4 AQ4 AO4 BP5 BQ5 BO5 AP5 AQ5 AO5
1 1.442375e+18 9905 70730 1 1 72830 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1.442375e+18 9940 70735 1 1 72830 1 1 70730 1 1 0 0 0 70285 1 1 0 0 0 70185 1 1 0 0 0 0 0 0 0 0 0
3 1.442375e+18 10427 70905 1 1 72830 1 1 70735 3 1 0 0 0 70730 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1.442375e+18 10452 70910 1 1 72830 1 1 70905 1 1 0 0 0 70735 2 1 0 0 0 70730 1 1 0 0 0 0 0 0 0 0 0
5 1.442375e+18 31223 70905 1 1 71805 1 1 70900 1 1 71810 1 1 70730 1 1 71815 1 1 70230 1 1 71925 3 1 70220 1 1 72075 4 1
6 1.442375e+18 68871 70910 1 1 71405 1 1 70905 2 1 71560 4 1 70900 1 1 71625 1 1 70795 1 1 71790 2 1 70730 1 1 71800 1 1
7 1.442375e+18 69041 70905 2 1 71405 1 1 70900 1 1 71560 4 1 70795 1 1 71625 1 1 70730 1 1 71790 2 1 70220 1 1 71800 1 1
8 1.442375e+18 70214 70910 1 1 71405 1 1 70905 2 1 71560 4 1 70900 1 1 71625 1 1 70795 1 1 71715 2 1 70730 1 1 71790 2 1
9 1.442375e+18 73954 70915 1 1 71405 1 1 70910 1 1 71560 4 1 70905 1 1 71625 1 1 70900 1 1 71715 2 1 70795 1 1 71800 1 1
10 1.442375e+18 82309 71000 1 1 71405 1 1 70915 1 1 71560 5 1 70905 1 1 71625 1 1 70785 1 1 71715 2 1 70730 1 1 71780 2 1
11 1.442375e+18 208143 71005 1 1 71500 1 1 71000 2 1 71700 4 1 70905 1 1 71705 4 1 70795 1 1 71800 1 1 70730 1 1 71840 2 1
12 1.442375e+18 208162 71010 1 1 71500 1 1 71005 1 1 71700 4 1 71000 2 1 71705 4 1 70905 1 1 71800 1 1 70795 1 1 71840 2 1
13 1.442375e+18 208216 71015 1 1 71500 1 1 71010 1 1 71700 4 1 71000 2 1 71705 4 1 70905 1 1 71800 1 1 70795 1 1 71840 2 1
14 1.442375e+18 220697 71000 1 1 71500 1 1 70905 1 1 71610 4 1 70795 1 1 71705 4 1 70760 2 1 71800 1 1 70730 1 1 71840 2 1
15 1.442375e+18 221546 71005 1 1 71500 1 1 71000 2 1 71610 4 1 70905 1 1 71705 4 1 70795 1 1 71800 1 1 70760 2 1 71840 2 1
16 1.442375e+18 242717 71000 1 1 71435 1 1 70905 1 1 71510 6 1 70785 1 1 71675 3 1 70735 2 1 71740 2 1 70730 1 1 71790 3 1
17 1.442375e+18 258996 71020 4 1 71435 1 1 71000 1 1 71510 6 1 70905 1 1 71570 4 1 70785 1 1 71740 2 1 70735 2 1 71790 3 1
18 1.442375e+18 410566 71025 1 1 71495 5 1 71020 4 1 71500 1 1 71000 1 1 71600 2 1 70905 1 1 71750 2 1 70840 1 1 71800 1 1
19 1.442375e+18 436429 71020 4 1 71495 4 1 71000 1 1 71500 1 1 70905 1 1 71655 4 1 70800 2 1 71790 2 1 70750 1 1 71800 1 1
20 1.442375e+18 539903 71025 1 1 71435 2 1 71020 4 1 71475 2 1 71000 1 1 71500 1 1 70905 1 1 71600 4 1 70900 1 1 71625 3 1
21 1.442375e+18 572089 71100 1 1 71435 2 1 71025 1 1 71475 2 1 71020 4 1 71500 1 1 71000 1 1 71600 4 1 70905 1 1 71635 2 1
22 1.442375e+18 588512 71105 1 1 71480 1 1 71100 1 1 71500 1 1 71025 1 1 71590 2 1 71020 4 1 71630 2 1 71000 1 1 71640 3 1
23 1.442375e+18 588658 71110 1 1 71500 1 1 71105 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71640 3 1 71000 1 1 71700 4 1
24 1.442375e+18 642054 71115 1 1 71500 1 1 71110 3 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71755 4 1 71000 1 1 71800 1 1
25 1.442375e+18 645889 71120 1 1 71500 1 1 71115 1 1 71590 2 1 71110 2 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
26 1.442375e+18 645894 71125 1 1 71500 1 1 71120 2 1 71590 2 1 71110 1 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
27 1.442375e+18 645897 71130 1 1 71500 1 1 71125 1 1 71590 2 1 71120 1 1 71630 2 1 71110 1 1 71755 4 1 71100 1 1 71800 1 1
28 1.442375e+18 645903 71135 1 1 71500 1 1 71130 2 1 71590 2 1 71110 1 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
29 1.442375e+18 645911 71140 1 1 71500 1 1 71135 1 1 71590 2 1 71130 1 1 71630 2 1 71110 1 1 71755 4 1 71100 1 1 71800 1 1
30 1.442375e+18 645914 71145 1 1 71500 1 1 71140 2 1 71590 2 1 71110 1 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
31 1.442375e+18 645917 71150 1 1 71500 1 1 71145 1 1 71590 2 1 71140 1 1 71630 2 1 71110 1 1 71755 4 1 71100 1 1 71800 1 1
32 1.442375e+18 645925 71155 1 1 71500 1 1 71150 2 1 71590 2 1 71110 1 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
33 1.442375e+18 645928 71160 1 1 71500 1 1 71155 1 1 71590 2 1 71150 1 1 71630 2 1 71110 1 1 71755 4 1 71100 1 1 71800 1 1
34 1.442375e+18 645938 71165 1 1 71500 1 1 71160 2 1 71590 2 1 71110 1 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
35 1.442375e+18 645966 71170 1 1 71500 1 1 71165 1 1 71590 2 1 71160 1 1 71630 2 1 71110 1 1 71755 4 1 71100 1 1 71800 1 1
36 1.442375e+18 646010 71175 1 1 71500 1 1 71170 2 1 71590 2 1 71110 1 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
37 1.442375e+18 646028 71180 1 1 71500 1 1 71175 1 1 71590 2 1 71170 1 1 71630 2 1 71110 1 1 71755 4 1 71100 1 1 71800 1 1
38 1.442375e+18 646064 71185 1 1 71500 1 1 71180 2 1 71590 2 1 71110 1 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
39 1.442375e+18 646077 71190 1 1 71500 1 1 71185 1 1 71590 2 1 71180 1 1 71630 2 1 71110 1 1 71755 4 1 71100 1 1 71800 1 1
40 1.442375e+18 646107 71195 1 1 71500 1 1 71190 2 1 71590 2 1 71110 1 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
41 1.442375e+18 646132 71200 1 1 71500 1 1 71195 1 1 71590 2 1 71190 1 1 71630 2 1 71110 1 1 71755 4 1 71100 1 1 71800 1 1
42 1.442375e+18 649141 71205 1 1 71500 1 1 71200 3 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71755 4 1 71000 1 1 71800 1 1
43 1.442375e+18 649160 71210 1 1 71500 1 1 71205 1 1 71590 2 1 71200 2 1 71630 2 1 71100 1 1 71755 4 1 71020 4 1 71800 1 1
44 1.442375e+18 659724 71100 1 1 71500 1 1 71020 4 1 71590 2 1 71000 1 1 71630 2 1 70980 2 1 71800 1 1 70940 1 1 71820 2 1
45 1.442375e+18 659737 71105 1 1 71500 1 1 71100 1 1 71590 2 1 71020 4 1 71630 2 1 71000 1 1 71800 1 1 70980 2 1 71820 2 1
46 1.442375e+18 659778 71110 1 1 71500 1 1 71105 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
47 1.442375e+18 659796 71115 1 1 71500 1 1 71110 1 1 71590 2 1 71105 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
48 1.442375e+18 659814 71120 1 1 71500 1 1 71115 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
49 1.442375e+18 659832 71125 1 1 71500 1 1 71120 1 1 71590 2 1 71115 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
50 1.442375e+18 659855 71130 1 1 71500 1 1 71125 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
51 1.442375e+18 659868 71135 1 1 71500 1 1 71130 1 1 71590 2 1 71125 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
52 1.442375e+18 659882 71140 1 1 71500 1 1 71135 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
53 1.442375e+18 659910 71145 1 1 71500 1 1 71140 1 1 71590 2 1 71135 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
54 1.442375e+18 659938 71150 1 1 71500 1 1 71145 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
55 1.442375e+18 659956 71155 1 1 71500 1 1 71150 1 1 71590 2 1 71145 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
56 1.442375e+18 659967 71160 1 1 71500 1 1 71155 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
57 1.442375e+18 659978 71165 1 1 71500 1 1 71160 1 1 71590 2 1 71155 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
58 1.442375e+18 660009 71170 1 1 71500 1 1 71165 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
59 1.442375e+18 660017 71175 1 1 71500 1 1 71170 1 1 71590 2 1 71165 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
60 1.442375e+18 660036 71180 1 1 71500 1 1 71175 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
61 1.442375e+18 660059 71185 1 1 71500 1 1 71180 1 1 71590 2 1 71175 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
62 1.442375e+18 660075 71190 1 1 71500 1 1 71185 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
63 1.442375e+18 660096 71195 1 1 71500 1 1 71190 1 1 71590 2 1 71185 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
64 1.442375e+18 660119 71200 1 1 71500 1 1 71195 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
65 1.442375e+18 660130 71205 1 1 71500 1 1 71200 1 1 71590 2 1 71195 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
66 1.442375e+18 660174 71210 1 1 71500 1 1 71205 2 1 71590 2 1 71100 1 1 71630 2 1 71020 4 1 71800 1 1 71000 1 1 71820 2 1
67 1.442375e+18 660204 71215 1 1 71500 1 1 71210 1 1 71590 2 1 71205 1 1 71630 2 1 71100 1 1 71800 1 1 71020 4 1 71820 2 1
68 1.442375e+18 676325 71220 1 1 71500 1 1 71215 3 1 71660 2 1 71100 1 1 71700 2 1 71020 4 1 71800 1 1 71000 3 1 71825 2 1
69 1.442375e+18 679352 71215 3 1 71500 1 1 71100 1 1 71660 2 1 71020 4 1 71700 2 1 71000 3 1 71800 1 1 70985 1 1 71825 2 1
70 1.442375e+18 679855 71225 1 1 71500 1 1 71215 2 1 71660 2 1 71100 1 1 71700 2 1 71020 4 1 71800 1 1 71000 3 1 71825 2 1
71 1.442375e+18 685588 71230 1 1 71500 1 1 71225 1 1 71660 2 1 71100 1 1 71700 2 1 71020 4 1 71800 1 1 71015 2 1 71835 2 1
72 1.442375e+18 685600 71235 1 1 71500 1 1 71230 1 1 71660 2 1 71100 1 1 71700 2 1 71020 4 1 71800 1 1 71015 2 1 71835 2 1
73 1.442375e+18 685609 71230 1 1 71500 1 1 71100 1 1 71660 2 1 71020 4 1 71700 2 1 71015 2 1 71800 1 1 70985 1 1 71835 2 1
74 1.442375e+18 685611 71240 1 1 71500 1 1 71100 1 1 71660 2 1 71020 4 1 71700 2 1 71015 2 1 71800 1 1 70985 1 1 71835 2 1
75 1.442375e+18 685635 71245 1 1 71500 1 1 71240 1 1 71660 2 1 71230 1 1 71700 2 1 71100 1 1 71800 1 1 71020 4 1 71835 2 1
76 1.442375e+18 690309 71500 1 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
77 1.442375e+18 690309 71245 1 1 71660 2 1 71100 1 1 71700 2 1 71020 4 1 71800 1 1 71015 2 1 71835 2 1 70985 1 1 72000 1 1
78 1.442375e+18 690344 71250 1 1 71660 2 1 71245 1 1 71700 2 1 71100 1 1 71800 1 1 71020 4 1 71835 2 1 71015 2 1 72000 1 1
79 1.442375e+18 691085 71255 1 1 71655 1 1 71250 3 1 71660 2 1 71100 1 1 71700 2 1 71020 4 1 71800 1 1 71015 2 1 71835 2 1
As you can see row 76 is a trade line and has data only in 5 columns. read.csv handles the missing values correctly and inserts NAs for them, this is what I want fread to do if it is possible.
But when I run fread, this is what I get:
> fdf<- fread(input = "AUROPHARMA15OCTFUT_20150916_ob.csv", sep = " ", header = FALSE, na.strings = " ", data.table = FALSE)
Error in fread(input = "AUROPHARMA15OCTFUT_20150916_ob.csv", sep = " ", :
Expected sep (' ') but new line or EOF ends field 5 on line 76 when reading data: 1442375160144000000 690309 71500 1 1
Even when I specify the colClasses:
> fdf<- fread(input = "AUROPHARMA15OCTFUT_20150916_ob.csv", sep = " ", header = FALSE, na.strings = "", data.table = FALSE, colClasses = c(NA, rep("integer",31), rep("NULL", 30)))
Error in fread(input = "AUROPHARMA15OCTFUT_20150916_ob.csv", sep = " ", :
Expected sep (' ') but new line or EOF ends field 5 on line 76 when reading data: 1442375160144000000 690309 71500 1 1
Do I have to change my na.strings argument? If so, what should it be? Or is there some other way I should be tackling this issue? Any help, suggestions or alternatives will be appreciated. Thanks for the help.
Let me know if any more info is needed.