0

The data is called: data.csv

I tried read.csv('data.csv') and an error message popped up saying the file was too large.

I'm not really sure how to use fread in this situation because when I tried:

require(data.table)
DT <- fread("data.csv")

That didn't work.

Any ideas?

I ended up trying to install the "bigmemory" package but it says

Warning in install.packages :
  package ‘bigmemory’ is not available (for R version 3.2.0)
Nick
  • 833
  • 2
  • 8
  • 11
  • 2
    file is empty means you didn't put in the correct path. Use `list.files` to make sure the file is in the directory you claim. – MichaelChirico Aug 18 '15 at 14:50
  • 1
    A useful tutorial on the `ff` and `bigmemory` packages: http://www.bytemining.com/wp-content/uploads/2010/08/r_hpc_II.pdf Should get you off and running. – Carl Witthoft Aug 18 '15 at 14:52
  • @CarlWitthoft it says I need the latest R version 3.2.0: Warning in install.packages : package ‘bigmemory’ is not available (for R version 3.2.0) Do you know how to update it to that? – Nick Aug 18 '15 at 15:06
  • Nick, if you don't know how to update `R`, how did you install it in the first place? Sounds like you should read the manuals available for download at CRAN. – Carl Witthoft Aug 18 '15 at 17:12
  • I actually figured that part out. And I have version 3.2.0 but it won't open in that. The only one that is higher than that 3.2.2, but this package has been out for awhile. The main thing I am trying to look for is the structure in the code for me to read the csv file. @CarlWitthoft – Nick Aug 18 '15 at 17:17

1 Answers1

0

I would suggest that you also try the following code:

tab5rows <- read.table("datatable.txt", header = TRUE, nrows = 5)
classes <- sapply(tab5rows, class)
tabAll <- read.table("datatable.txt", header = TRUE, colClasses = classes)

Discussed in detail here can significantly improve the speed of reading big files. More importantly, first line would enable you to look inside the file. If you can open it the scope for manoeuvre is large. Alternatively, it can be worthwhile to read file in binary:

messy_file <- readLines(file("ProblematicData.csv", "rb"), encoding="UTF-8", skipNul=TRUE)

Edit

In addition, I would suggest that you have a look at this discussion where some options for reading big files are discussed in detail.

My approach to the problem would be:

  1. Try first option with read.table, alternatively
  2. Try fread from the data.table, alternatively
  3. Read as binary
Community
  • 1
  • 1
Konrad
  • 17,740
  • 16
  • 106
  • 167
  • my data is not a txt file though, it is a csv, so it doesn't read the file datatable.txt @Konrad – Nick Aug 18 '15 at 16:14
  • I think you want "peEk," not "peAk" :-) – Carl Witthoft Aug 18 '15 at 17:13
  • @CarlWitthoft do you know how to use the code above with a csv file? He uses txt files but I don't have that – Nick Aug 18 '15 at 17:22
  • 1
    Nick, a "csv" file **is** a text file, with a comma defined as the data delimiter. Don't be fooled by those dang windows file extensions. – Carl Witthoft Aug 18 '15 at 19:17
  • @Nick, the problem of reading `CSV` file via `readLines` was discussed on SO on a couple of occasions, [example](http://stackoverflow.com/questions/6119667/in-r-how-do-i-read-a-csv-file-line-by-line-and-have-the-contents-recognised-as-t). – Konrad Aug 19 '15 at 09:40