4

I am loading > 220K rows into a sqlite3 db. Each row is stored in a separate file, hence > 220K files.

fs.readdir(dir, {}, (err, files) => {
    files.forEach(file => {

        fs.readFile(path.join(dir, file), 'utf8', (err, data) => {

            //.. process file and insert into db ..

        });
    });
});

The above causes the Error: EMFILE: too many open files error. From what I understand, I shouldn't have to close the files because apparently fs.readFile operates on a file and the closes it for me. I am on Mac OS X, and my ulimit is set to 8192

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 8192
pipe size            (512 bytes, -p) 1
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 709
virtual memory          (kbytes, -v) unlimited

What can I do to get past this error?

punkish
  • 13,598
  • 26
  • 66
  • 101
  • 1
    This all happens asynchronously, so at some time your code throws up (because your code starts opening new files permanently before waiting for others to get closed). I would suggest that you read your files sequentially in this case. – Rob Apr 05 '18 at 13:00
  • The documentation says there is a `readFileSync` as well. Maybe it is more efficient to use `readFile`, but then use it on only a few files at a time. – Arndt Jonasson Apr 05 '18 at 13:02
  • The command `prlimit` on Linux shows me that my computer has the limit 1024 but it can be raised to 1048576. Maybe you can do the same. Whether it's a good idea to actually have that many files open at a time, I don't know. – Arndt Jonasson Apr 05 '18 at 13:14
  • yes, using `fs.readFileSync` solved the issue. Many thanks all. – punkish Apr 05 '18 at 14:30
  • Keep in mind that using `fs.readFileSync` will only ever open one file at a time, because it's a blocking process. – Aramil Rey Apr 09 '18 at 13:06

1 Answers1

10

Solution

You can solve this issue by queuing up the readFile operations as soon there is an EMFILE error, and only executing reads after something was closed, luckily, this is exactly what graceful-fs does, so simply replacing the fs module with graceful-fs will fix your issue

const fs = require('graceful-fs');

Problem

Due to the async nature of node, your process is trying to open more files than allowed (8192) so it produces an error. Each iteration in your loop starts reading a file and then it immediately continues with the next iteration.

To read them, the files are being opened, but are not being closed until the read succeeds or fails.

Aramil Rey
  • 3,387
  • 1
  • 19
  • 30