1

I am using R to do a simple plotting with a .txt file. However, my .txt file is too large. It is about 1 GB (consisting of 3 columns and 41,633,303 rows).

When I call read.delim2 to import it into a data frame, I get an error saying that vectors cannot be made that big. Is there a way around this?

I eventually want to make a faceted histograms with this data.

head from txt file

25      Large   Inversion
26      Large   Inversion
27      Large   Inversion
28      Large   Inversion
29      Large   Inversion
30      Large   Inversion
31      Large   Inversion
32      Large   Inversion
33      Large   Inversion
34      Large   Inversion
35      Large   Inversion
36      Large   Inversion
37      Large   Inversion
38      Large   Inversion
39      Large   Inversion
40      Large   Inversion
41      Large   Inversion
42      Large   Inversion
43      Large   Inversion
44      Large   Inversion
45      Large   Inversion
46      Large   Inversion
47      Large   Inversion
48      Large   Inversion
49      Large   Inversion
50      Large   Inversion
51      Large   Inversion
52      Large   Inversion
53      Large   Inversion
54      Large   Inversion
55      Large   Inversion
56      Large   Inversion
57      Large   Inversion
58      Large   Inversion
58      Large   Inversion

Command to read:

mydf <- read.delim2(file = "/home/hayden/polyRNA/Bowtier/datatolook/allData.txt", header = F)

Error

Error: cannot allocate vector of size 500.0 Mb
Community
  • 1
  • 1
Nicholas Hayden
  • 473
  • 1
  • 3
  • 24
  • `v <- vector(length = 41633303)` - does not seem to be a problem to make a vector that big. I suggest you add the original error message + a sample of your file + the code your used to try to read it. – lukeA Jul 21 '16 at 19:44
  • It has been updated – Nicholas Hayden Jul 21 '16 at 19:51
  • Thanks. What are your memory limits (see `help("Memory-limits")`)? I mean I'd say 500mb not too much for a modern computer. :) In addition, have you tried `read.table` instead of `read.delim2`? Perhaps it's not a tab-separated file. Also, you may want to try `fread` from the `data.table` package or `read_tsv` from the `readr` package, which tend to be much faster... – lukeA Jul 21 '16 at 19:56
  • I have not tried those methods. However, I am working on the command line, and I do not think I have permissions to change the memory I have access to. D: – Nicholas Hayden Jul 21 '16 at 20:06
  • Hmm well If you have less than 500mb of memory available, then I guess you cannot read in a 1gb file in one go. Although probably not very efficient, one way could be to read and aggregate it line by line using the nlines and skip arguments of e.g. `scan` or `read.delim2`. – lukeA Jul 21 '16 at 20:22
  • Have you tried [`ff`](https://cran.r-project.org/web/packages/ff/ff.pdf)package? Looks like it may be useful for your problem. – Edu Jul 21 '16 at 20:29
  • I'll try those packages and see what I can do. – Nicholas Hayden Jul 21 '16 at 20:36
  • 1
    Possible duplicate of [Quickly reading very large tables as dataframes in R](https://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r) – GoGonzo Sep 20 '17 at 07:43

0 Answers0