6

I try to load big file (~6Gb) into memory with fs.readFileSync on the server with 96GB RAM.

The problem is it fails with the following error message

RangeError: Attempt to allocate Buffer larger than maximum size: 0x3fffffff bytes

Unfortunately I didn't find how it is possible to increase Buffer, it seems like it's a constant.

How I can overcome this problem and load a big file with Node.js?

Thank you!

com
  • 2,606
  • 6
  • 29
  • 44
  • You will likely want to deal with the file in chunks. What are you attempting to do with the file? Do you really need the entire thing in RAM at once? – jfriend00 Apr 21 '15 at 08:45
  • Yes, I need the entire file in RAN at once. The file contains a big hash and I need to work with entire hash. – com Apr 21 '15 at 08:53
  • What do you mean it "contains a big hash"? Do you mean it contains data that you're going to put into a hash table? Why can't you process it in pieces? – jfriend00 Apr 21 '15 at 08:58

2 Answers2

10

I have also with same problem when try to load 6.4G video file to create file hash. I read whole file by fs.readFile() and it cause an error RangeError [ERR_FS_FILE_TOO_LARGE]. Then i use stream to do it:

let hash = crypto.createHash('md5'),
    stream = fs.createReadStream(file_path);

stream.on('data', _buff => { hash.update(_buff, 'utf8'); });
stream.on('end', () => { 
    const hashCheckSum = hash.digest('hex');
    // Save the hashCheckSum into database.
});

Hope it helped.

NgaNguyenDuy
  • 1,306
  • 17
  • 13
7

From a joyent FAQ:

What is the memory limit on a node process?

Currently, by default v8 has a memory limit of 512mb on 32-bit systems, and 1gb on 64-bit systems. The limit can be raised by setting --max_old_space_size to a maximum of ~1024 (~1 GiB) (32-bit) and ~1741 (~1.7GiB) (64-bit), but it is recommended that you split your single process into several workers if you are hitting memory limits.

If you show more detail about what's in the file and what you're doing with it, we can probably offer some ideas on how to work with it in chunks. If it's pure data, then you probably want to be using a database and let the database handle getting things from disk as needed and manage the memory.

Here's a fairly recent discussion of the issue: https://code.google.com/p/v8/issues/detail?id=847

And, here's a blog post that claims you can edit the V8 source code and rebuilt node to remove the memory limit. Try this at your own discretion.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • Thank you very much for your answer. The files contains [word embeddings](http://en.wikipedia.org/wiki/Word_embedding) for a large vocabulary, where every word mapped to 300-dimensional vector. The main task is to supervised classification where a train is a big corpus, where I need to replace every word with appropriate vector. – com Apr 21 '15 at 09:20
  • @fog - I can't say I fully understand what you're doing, but I think you're going to want to manage this 6GB of data in some sort of database and use the database to find and fetch pieces of the data as needed. Or, maybe node.js isn't the right tool for the job. – jfriend00 Apr 21 '15 at 09:24
  • I tried to use Redis with this collection, however it was very slow even with buffering vectors I already used. – com Apr 21 '15 at 09:28
  • I am not sure this is accurate. If I have a file that is 3.6GB and I use fs.readFileSync('file.txt') then it throws `RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3941361242) is greater than 2 GB` (node 15.14). I can Buffer.alloc a buffer larger than 2GB, and use NODE_OPTIONS='--max-old-space-size=7000' but no change – Colin D Nov 08 '21 at 14:12