File 135M rows, 22 cols has a random head note in the first 4 lines followed by column headers. How do I skip the first few lines in Laf Package in R

Question

I am using the LaF package to read this large file with 135M rows and 22 Cols ~ 15 GB of raw data, pipe delimited. The raw file unfortunately has a random head notes in the first 4 lines followed by column headers.

Edit: I am sorry I should have mentioned earlier, I am on Windows Server 2012 R2

The data is as follows:

gpg: encrypted with 1024-bit ELG key, ID XXXXXXXX, created 2006-10-30
***email id*** 
gpg: encrypted with 2048-bit RSA key, ID XXXXXXXX, created 2014-12-05
***email id*** 
COLUMN HEADERS (22) 
DATA 
. 
. 
.

I can get the model properly by skipping the first 4 lines.

modelF1 <- detect_dm_csv("trxn_.txt", sep="|", header=TRUE, nrows=10000, skip=4)
dfF1Laf <- laf_open(modelF1)

But when I try to skip the first 4 lines using goto it gives me the following error

goto(dfF1Laf,6)

Error in goto(dfF1Laf , 6) : Line has too many columns

How do I get around this?

I need to be able to summarize the data, so I am using this package as it seemed neat for my purpose. I have tried ffdf, data.table::fread, but they were either too slow or could not fit in the RAM.

I am open to using other packages as well.

what OS are you on? Can you use `tail --lines=+4` in the shell? http://stackoverflow.com/questions/604864/print-a-file-skipping-x-lines-in-bash — Ben Bolker, Dec 08 '16 at 14:33
I was really short on time and had no option but to tamper with the input raw file. I used _cygwin_ and _sed_ to get rid of the first 4 lines. Installed cygwin and used `sed -i 1,4d myfile.txt` — SatZ, Dec 09 '16 at 09:06

File 135M rows, 22 cols has a random head note in the first 4 lines followed by column headers. How do I skip the first few lines in Laf Package in R

0 Answers0