4

I am trying to read multiple files (csv) using fread function. But at last row I have unnecessary data and I am unable to use fread as it is throwing an error.

Code:

library(data.table)    
fnames <- list.files("Path",pattern = "^.*Star.*.csv$",full=TRUE)

   read_data <- function(z){
      dat <- fread(z, verbose = TRUE, nrow= -1)
   }

   datalist <- lapply(fnames, fread)

   bigdata <- rbindlist(datalist, use.names = TRUE)

Error:

Error during wrapup: Expected sep (',') but new line, EOF (or other non printing character) ends field 4 when detecting types from point 10: 2704,IE,N,ENDOFFILEMARKER,5397786

I have a row with data ENDOFFILEMARKER at last of each file.

Note:


  • I need to use fread as each data file is around 700 MB.

Ferdi
  • 540
  • 3
  • 12
  • 23
dharma
  • 51
  • 1
  • 5
  • 1
    See [this](https://stackoverflow.com/q/36558437/1270695) perhaps, particularly the comments. – A5C1D2H2I1M1N2O1R2T1 Dec 27 '17 at 17:47
  • 3
    The general recommendation for now seems to be `fread("head -n -1 filename.csv")`. – A5C1D2H2I1M1N2O1R2T1 Dec 27 '17 at 17:47
  • Can I use these to run in loop? – dharma Dec 28 '17 at 04:58
  • Sure.... Why not? – A5C1D2H2I1M1N2O1R2T1 Dec 28 '17 at 05:14
  • I am getting following error. – dharma Dec 28 '17 at 17:51
  • Error in fread("head -n -1 filename.csv") : File not found: C:\Users\DHARMA~1.CHI\AppData\Local\Temp\5\RtmpiIAtfk\file25e46fdb6c44 In addition: Warning messages: 1: running command 'C:\Windows\system32\cmd.exe /c (head -n -1 filename.csv) > C:\Users\DHARMA~1.CHI\AppData\Local\Temp\5\RtmpiIAtfk\file25e46fdb6c44' had status 1 2: In shell(paste("(", input, ") > ", tt, sep = "")) : '(head -n -1 filename.csv) > C:\Users\DHARMA~1.CHI\AppData\Local\Temp\5\RtmpiIAtfk\file25e46fdb6c44' execution failed with error code 1 – dharma Dec 28 '17 at 17:51
  • Did you try changing filename.csv to the name of the csv file you are trying to read in using fread? Did you place the csv file in your working directory? – FG7 Dec 29 '17 at 03:05
  • What operating system are you using? – FG7 Dec 29 '17 at 03:38

1 Answers1

3

Without seeing your csv files, it is difficult to determine the best answer. Perhaps try reading in one file first using fread. Using something like this may work:

dat <- fread("grep -v ENDOFFILEMARKER filename.csv")

where filename.csv is the name of one of your files placed in your working directory. The -v makes grep return all lines except lines containing the string ENDOFFILEMARKER. If you can get it working with one file, you can then work on applying similar logic to all of the files using lapply.

Another option which has worked for me is using the readLines function. The downside is that the readLines function is somewhat slow. But, if you can't figure out another way, then readLines will work. Here's basically how I used it on one file:

length_a <- length(readLines("filename.csv"))
dt <- fread("filename.csv", nrows = length_a-1)

Once you have it working for one file, you can then figure out how to use it with a loop for all your files.

I understand that fread("head -n -1 filename.csv") is the generally accepted method of skipping the last line but I have never been able to get it to work properly.

Edit: If you are using Windows, this may work for you:

 dat <- fread('findstr /V /C:"ENDOFFILEMARKER" filename.csv')

grep works well if you are using Linux or have Linux tools installed on your Windows machine. If you are using Windows, findstr command is similar to the grep command in Linux. The /V returns all lines except the line containing ENDOFFILEMARKER. The /C:"... ..." allows for matching multiple words including spaces or just one word exactly.

FG7
  • 469
  • 4
  • 14