I have a large file (data.txt, 35 GB) which has 3 columns. Some example part of the file would look like the following:
... ... ...
5 701565 8679.56
8 1.16201e+006 3193.18
1 1.16173e+006 4457.85
14 1.16173e+006 4457.85
9 1.77942e+006 7208.73
4 1.78011e+006 8239.88
14 1.78019e+006 8195.57
9 2.00206e+006 8858.55
4 2.00199e+006 7924
... ... ...
I want to plot a histogram for the 3rd column when the values in the second column are between 0 and 50'000.
Then I want to do another histogram where the values of the first column are between 50'000 and 100'000. And so on, and so forth.
I don't know how to load/read only the data I need at a time. Any help would be appreciated!
If I should use the sqldf package then the question I have would be how I can say that the value of the 2nd column should be smaller than a e.g. 50'000?
The difference to How do i read only lines that fulfil a condition from a csv into R? is that I don't have any column names. Therefore I cannot do what they propose in their solution:
sql = "select * from file where Sepal.Length > 5"