1

I am attempting to read a .txt file into R that filled with data and 2.51 GB in size, I have tried using the fread code from the data.table package.

Here my example code.

library(data.table)

Testing <- fread("C:/Users/Juand/Documents/FRED/Data/testing data/LabDataExport_20220908.txt", header = TRUE, sep = "|")

My result is this, I keep getting back this error message

Error in fread("C:/Users/Juand/Documents/FRED/Data/testing data/LabDataExport_20220908.txt",  : 
  R character strings are limited to 2^31-1 bytes

Doing some research, some people have reported success loading in files larger than 6 GB using the fread command but others have stated that R does not support loading in files larger than 2 GB due to memory size limitation.

I have been using this link below as my guide.

READING LARGE CSV FILE WITH R

What are my options here?

Structure of File Following Image

The file 9,621,354 lines and the delimiter appears to be "|"

  • 1
    The problem isn't the total size of your file, the problem is at least one individual string within your file is too long--more than 2^31 - 1 bytes. If you could provide some more context about the structure of your file, that could help us help you. – Gregor Thomas Feb 13 '23 at 16:04
  • 1
    It sounds like the file might be improperly formatted. Are you sure `|` is the correct delimiter? How many lines of data are in this file? Are lines separated with a new-line character? – MrFlick Feb 13 '23 at 16:16
  • Hello, I made edits to my original post demonstrating what the file looks like. @GregorThomas – Majorian420 Feb 13 '23 at 16:44
  • Hello, I made edits to my original post demonstrating what the file looks like. @MrFlick – Majorian420 Feb 13 '23 at 16:45
  • From what we can see, that file looks properly formatted. But that error suggests that somewhere in the file there's an issue. I'd suggest using a command line tool to look for the longest lines, [something like this](https://stackoverflow.com/questions/1655372/longest-line-in-a-file), to see if you can find a problem or irregularity. – Gregor Thomas Feb 13 '23 at 16:47
  • 2
    what happens if you use `fread` with `nrows=3` argument? Does it read properly the first 3 rows? – Waldi Feb 13 '23 at 16:55
  • I bet there's an unmatched quote in there somewhere. If R sees a `"` but not a closing `"` it will assume everything after is all one string which would explain the large string size error. If you don't expect your string values to use quotes, you can also set `quote=""` in `fread` – MrFlick Feb 13 '23 at 18:00
  • @Waldi I introduced the `nrows=3` code and R created a small sample of the .TXT file for the 1st three rows but I also got this back in the console > `##R character strings are limited to 2^31-1 bytes > ## Won't create variable due to memory limits for R`, I am curious on why I keep receiving warnings about the memory limitations for R. – Majorian420 Feb 13 '23 at 18:36
  • 1
    see https://github.com/Rdatatable/data.table/issues/5338 – Waldi Feb 13 '23 at 18:50

0 Answers0