R data.table: using fread on all .csv files in folder skipping the last line of each

Question

I have hundreds of .csv files I need to read in using fread and save as one data table. The basic structure is the same for each .csv. There is header info that needs to be skipped (easy using skip = ). I am having difficulty with skipping the last line of each .csv file. Each .csv file has a different number of rows.

If I have only one file in the Test folder, this script perfectly skips the first rows (using skip = ) and the last row (using nrows = ):

file <- list.files("Q:/Test/", full.names=TRUE)
all <- fread(file, skip = 7, select = c(1:7,9),
             nrows = length(readLines(file))-9)

When saving multiple files in the Test folder, this is the code I tried:

file <- list.files("Q:/Test/", full.names=TRUE)
L <- lapply(file, fread, skip = 7, select = c(1:7,9),
        nrows = length(readLines(file))-9)
dt <- rbindlist(L)

It doesn't create L and gives me this error:

Error in file(con, "r") : invalid 'description' argument

Any ideas on how to skip the last row of each .csv when each .csv has a different number of rows?

I am using data.table version 1.9.6. Thanks.

Don't use `readLines`, that wastes a lot of effort. Try the approach here: http://stackoverflow.com/questions/3137094/how-to-count-lines-in-a-document — MichaelChirico, Apr 11 '16 at 20:31
Perhaps `nrow` could use a negative value to skip lines from bottom of the file.. Filed [#1643](https://github.com/Rdatatable/data.table/issues/1643). — Arun, Apr 11 '16 at 21:04
Maybe `head -n-1` passed to `fread` directly. Or a `grep -v` to remove the trailing footer text. See section 1 of [this new page](https://github.com/Rdatatable/data.table/wiki/Convenience-features-of-fread). — Matt Dowle, Apr 11 '16 at 21:19
Also [this answer](http://stackoverflow.com/a/35786076/403310) might help. — Matt Dowle, Apr 11 '16 at 21:22
@MichaelChirico I like this approach and am trying to work it out. I use Rstudio on Windows 7 so I believe I need to use Cygwin. So far I haven't been able to make it work. — FG7, Apr 12 '16 at 16:37
@Arun Thanks. It would be a great addition to data.table if not too difficult to implement. — FG7, Apr 12 '16 at 16:39
@MattDowle Thank you. From your other linked answers, I need to install Cygwin on my Windows 7 machine. Still working on getting it to work properly. I still only get error messages but I believe the issue is a Cygwin/Windows problem. Your suggestions should work. — FG7, Apr 12 '16 at 16:44
@FG7 did you add the Cygwin bin to your PATH? What are the error messages. Never heard of persistent problems before and it's widely used. — Matt Dowle, Apr 12 '16 at 18:27

Gautam · Answer 1 · 2020-05-03T03:14:31.400

It's a bit late, but here's what worked for me:

library(data.table)

fnames <- dir("path", pattern = "csv")

read_data <- function(z){
  dat <- fread(z, skip = 1, select = 1)
  return(dat[1:(nrow(dat)-1),])
}

datalist <- lapply(fnames, read_data)

bigdata <- rbindlist(datalist, use.names = TRUE)

Here path refers to the directory that you're looking into. I'm assuming that the names are similar for all read files, if not, you can always define a new name for bigdata using names. Hope this helps!

R data.table: using fread on all .csv files in folder skipping the last line of each

1 Answers1

Linked