4

I keep getting a 'Too many open files' error when doing something like this:

# read file names
file_names = []
for file_name in os.listdir(path):
    if '.json' not in file_name: continue
    file_names.append(file_name)

# process file names...

# iter files
for file_name in file_names:

    # load file into DF
    file_path = path + '/' + file_name
    df = pandas.read_json(file_path)

    # process the data, etc...
    # not real var names, just for illustration purposes...

    json_arr_1 = ...
    json_arr_2 = ...

    # save DF1 to new file
    df_1 = pandas.DataFrame(data=json_arr_1)
    file_name2 = os.getcwd() + '/db/' + folder_name + '/' + file_name
    df_1.to_json(file_name2, orient='records')

    # save DF2 to new file
    df_2 = pandas.DataFrame(data=json_arr_2)
    file_name3 = os.getcwd() + '/db/other/' + folder_name + '/' + file_name
    df_2.to_json(file_name3, orient='records')

The DF documentation doesn't mention having to handle open or closed files and I don't think listdir keeps pointers to open files (should just return a list of strings).

Where am I going wrong?

William Falcon
  • 9,813
  • 14
  • 67
  • 110

1 Answers1

1

It seems like a system issue, and not pandas issue.

You might need to increase the number of open files in the system.

How to increase number: https://easyengine.io/tutorials/linux/increase-open-files-limit/

The following Q&A: IOError: [Errno 24] Too many open files: discuss about ulimit and the limit of open files

This Q&A discuss about number of open files in Linux: https://unix.stackexchange.com/questions/36841/why-is-number-of-open-files-limited-in-linux

Community
  • 1
  • 1
Yaron
  • 10,166
  • 9
  • 45
  • 65
  • But the number of file names could vary so this could happen for n files *(1 read + 2 saves). Should the limit be 3 or should the limit be n*3? But if I don't know n ahead of time, I can't set a limit. – William Falcon May 16 '16 at 14:13
  • How many files are being opened in your program? The system limit should be high enough for a "normal" operation – Yaron May 16 '16 at 14:15
  • That's my question... Am I opening more than 3? I think I'm just opening 3, 1 to read json, 2 to write json. So total of 3. But it would be 3+ if pandas didn't release the files on each loop... – William Falcon May 16 '16 at 14:34
  • I see the following line in your code: "for file_name in os.listdir(path)" - how many ".json" files do you have in directory named "path"? – Yaron May 16 '16 at 14:44
  • about 6000 files. But that just lists the file names... no? Not actually opens them. The crash happens on the second for loop though, not while reading from the directory. – William Falcon May 16 '16 at 15:00
  • I'm not sure that you are correct... - see http://stackoverflow.com/a/4099039/5088142 - You're hitting a historical artifact in Python: os.listdir should return an iterator, not an array. I think this function predates iterators--it's odd that no os.xlistdir has been added. – Yaron May 16 '16 at 15:05