Probably answered many times before (e.g., by me), but here's some data
fl = tempfile()
dim(mtcars)
write.csv(mtcars, file=fl)
Use a connection to open the file, then read in 10 rows
fin = file(fl, open="r")
nrows <- 10
data <- read.csv(fin, nrows=nrows) # first chunk
Remember the column names and classes
col.names <- names(data) # remember column names and...
colClasses <- sapply(data, class) # ... column classes
then process the chunk and read in the next chunk of data, making sure to add the header and column classes. Stop reading when there's no more data.
repeat {
## process data...
cat("Read", nrow(data), "rows\n")
## ...then read the next chunk
data <- read.csv(fin, header=FALSE, colClasses=colClasses,
col.names=col.names, nrows=nrows)
if (nrow(data) == 0) # done yet?
break
}
mtcars
has 32 rows, and we see
Read 10 rows
Read 10 rows
Read 10 rows
Read 2 rows
We can verify that each chunk has the correct header, and the columns all have consistent classes. There could be problems with factors and inconsistent levels across chunks, especially when reading small chunks; maybe the argument stringsAsFactors=FALSE
is appropriate?