I am working on a similar issue as was stated on this other posting and tried adapting the code to select the columns I am interested in and making it fit my data file.
My issue, however, is that the resulting file has become larger than the original one, and I'm not sure the code is working the way I intended.
When I open with SPSS, the dataset seems to have taken in the header line, and then made millions of copies without end of the second line (I had to force stop the process).
I noticed there's no counter in the while loop specifying the line, might this be the case? My background in programming with R is very limited. The file is a .csv and is 4.8GB with 329 variables and millions of rows. I only need to keep around 30 of the variables.
This is the code I used:
##Open separate connections to hold cursor position
file.in <- file('npidata_20050523-20130707.csv', 'rt')
file.out<- file('Mainoutnpidata.txt', 'wt')
line<-readLines(file.in,n=1)
line.split <-strsplit(line, ',')
##Column picking, only column 1
cat(line.split[[1]][1:11],line.split[[1]][23:25], line.split[[1]][31:33], line.split[[1]][308:311], sep = ",", file = file.out, fill= TRUE)
##Use a loop to read in the rest of the lines
line <-readLines(file.in, n=1)
while (length(line)){
line.split <-strsplit(line, ',')
if (length(line.split[[1]])>1) {
cat(line.split[[1]][1:11],line.split[[1]][23:25], line.split[[1]][31:33], line.split[[1]][308:311],sep = ",", file = file.out, fill= TRUE)
}
}
close(file.in)
close(file.out)