I am using the LaF package to read this large file with 135M rows and 22 Cols ~ 15 GB of raw data, pipe delimited. The raw file unfortunately has a random head notes in the first 4 lines followed by column headers.
Edit: I am sorry I should have mentioned earlier, I am on Windows Server 2012 R2
The data is as follows:
gpg: encrypted with 1024-bit ELG key, ID XXXXXXXX, created 2006-10-30
***email id***
gpg: encrypted with 2048-bit RSA key, ID XXXXXXXX, created 2014-12-05
***email id***
COLUMN HEADERS (22)
DATA
.
.
.
I can get the model properly by skipping the first 4 lines.
modelF1 <- detect_dm_csv("trxn_.txt", sep="|", header=TRUE, nrows=10000, skip=4)
dfF1Laf <- laf_open(modelF1)
But when I try to skip the first 4 lines using goto it gives me the following error
goto(dfF1Laf,6)
Error in goto(dfF1Laf , 6) : Line has too many columns
How do I get around this?
I need to be able to summarize the data, so I am using this package as it seemed neat for my purpose. I have tried ffdf, data.table::fread, but they were either too slow or could not fit in the RAM.
I am open to using other packages as well.