I'm trying to load only the 3rd column from all csv files in a folder. Each file has 20,000 rows and 4 columns, and I'd like to end up with a data frame of n (n = number or files; 6 in this case) columns and 20,000 rows. I tried using colClasses
to specify the columns I want to load, but without success. Also, I can't get it to create n columns rather than 4 columns (where each column represents one variable found in the all of the files). I'm trying to get a 6 * 20,000 data frame where each column represents a specified variable from each file (the 3rd column from each file). Any suggestions?
Asked
Active
Viewed 1,176 times
1
-
Maybe something like `rbindlist(sapply(files, fread, header=TRUE, nrows=r, select=3:5))`? With the `select` argument uou can choose which columns are loaded. You can also drop columns with the `drop` argument. See also [*"Only read limited number of columns in R"*](http://stackoverflow.com/questions/5788117/only-read-limited-number-of-columns-in-r/33201353#33201353) and [*"Reading multiple files into R - Best practice"*](http://stackoverflow.com/questions/32888757/reading-multiple-files-into-r-best-practice/32888918#32888918) for more info – Jaap Mar 05 '16 at 09:42
2 Answers
2
Like I said in my comment, you can use the select
or drop
arguments of fread
.
An alternative solution is to read the files sapply
and use idcol
parameter of rbindlist
to create an id-column. Next you reshape the dataset to wide format as follows (you will need data.table 1.9.7 for this):
library(data.table)
DT <- rbindlist(sapply(files, fread, header=TRUE, nrows=r, select=3), idcol = "id")
dcast(DT, rowid(id) ~ id, value.var="name-of-selected-column")
The result is a datatable with the used filenames as columnnames.

Jaap
- 81,064
- 34
- 182
- 193
1
Try using the select
argument of fread
for keeping columns, or drop
to not read columns, not colClasses
. (with data.table 1.9.6
)
something like this :
ltab <- lapply(files, fread, header=TRUE, select = 3, nrows=r))
You should obtain a list of 6 tables of 20000 rows and 1 column
Then a Tab <- do.call("cbind", ltab)
should work.

cderv
- 6,272
- 1
- 21
- 31