How to open more than 19 files in parallel (Python)?

Question

I have a project that needs to read data, then write in more than 23 CSV files in parallel depending on each line. For example, if the line is about temperature, we should write to temperature.csv, if about humidity, >>to humid.CSV , etc.

I tried the following:

with open('Results\\GHCN_Daily\\MetLocations.csv','wb+') as locations, \
            open('Results\\GHCN_Daily\\Tmax.csv','wb+')as tmax_d, \
            open('Results\\GHCN_Daily\\Tmin.csv','wb+')as tmin_d, \
            open('Results\\GHCN_Daily\\Snow.csv', 'wb+')as snow_d, \
            .
            .
            # total of 23 'open' statements
            .

            open('Results\\GHCN_Daily\\SnowDepth.csv','wb+')as snwd_d, \
            open('Results\\GHCN_Daily\\Cloud.csv', 'wb+')as cloud_d, \
            open('Results\\GHCN_Daily\\Evap.csv', 'wb+')as evap_d, \

I got the following error

SystemError: too many statically nested blocks python

I searched for this error, and I get to this post which says that

You will encounter this error when you nest blocks more than 20. This is a design decision of Python interpreter to restrict it to 20.

But the open statement I wrote opens the files in parallel, not nested.

What am I doing wrong, and how can I solve this problem?

Thanks in advance.

I'm not sure if its right for the work you want to do but why not just open and read the files and store them in a dictionary with filenames as keys, work on it and write them when you are done? — hashcode55, Jan 24 '17 at 15:52
Or you could make an excel workbook and make a bunch of different worksheets — aberger, Jan 24 '17 at 16:05
Thanks very much for your comments and suggestions @aberger . Actually, the original data is huge (47 GB), and hence the resultant tables will be as well, that's why it is not possible to use Excel or even a single Access database for it. — Mohammad ElNesr, Jan 25 '17 at 08:59
Thank you for your comment @hashcode55. I will try your suggestion. — Mohammad ElNesr, Jan 25 '17 at 09:01

score 3 · Accepted Answer · answered Jan 24 '17 at 16:20

Each open is a nested context, its just that python syntax allows you to put them in a comma-separated list. contextlib.ExitStack is a context container that lets you put as many contexts as you like in a stack and exits each of them when you are done. So, you could do

import contextlib

files_to_process = (
    ('Results\\GHCN_Daily\\MetLocations.csv', 'locations'),
    ('Results\\GHCN_Daily\\Tmax.csv', 'tmax_d'),
    ('Results\\GHCN_Daily\\Tmin.csv', 'tmin_d'),
    # ...
)

with contextlib.ExitStack() as stack:
    files = {varname:stack.enter_context(open(filename, 'rb'))
        for filename, varname in files_to_process}
    # and for instance...
    files['locations'].writeline('my location\n')

If you find dict access less tidy than attribute access, you could create a simple container class

class SimpleNamespace:

    def __init__(self, name_val_pairs):
        self.__dict__.update(name_val_pairs)

with contextlib.ExitStack() as stack:
    files = SimpleNamespace(((varname, stack.enter_context(open(filename, 'rb')))
        for filename, varname in files_to_process))
    # and for instance...
    files.locations.writeline('my location\n')

Many thanks @tdelaney, This solved the problem. **Additional note:** I use Python 2.7*, where `contextlib.ExitStack()` is not available. So I imported the alternative library `contextlib2.ExitStack()`. SO, the Py2.7 users should import `contextlib2` instead of `context lib`. — Mohammad ElNesr, Jan 25 '17 at 10:09

score 1 · Answer 2 · answered Jan 24 '17 at 16:01

i would have a list of possible files = ['humidity','temperature',...]
make a dic that contain the possible file, a dataframe, a path to the file, for example:

main_dic = {}

for file in possible_files:

    main_dic[file][path] = '%s.csv' %file
    main_dic[file][data] = pd.DataFrame([], columns=['value','other_column','another_column', ....])

afterwards, i wld read whatever doc you are getting the values from and store em on the proper dictionary dataframe.

when finished just save the data on csv, example:

for file in main_dic:

     main_dic[file][data].to_csv('%s.csv' %file, index=False)

hope it helps

I don't see how a bunch of in-core dataframes are a good solution here. He's just fanning out to multiple csvs. This will bloat memory and make the solution less scalable. — tdelaney, Jan 24 '17 at 16:23

score 0 · Answer 3 · answered Jan 24 '17 at 15:58

0

If the data is not very huge, why not read in all the data and group the data by categories ( e.g. put all data about temperature into one group ), then write the grouped data into corresponding files at one go?

answered Jan 24 '17 at 15:58

OceanBlue

21
3

OP has a better solution already by just fanning out to the csvs as data is processed. – tdelaney Jan 24 '17 at 16:23
Indeed, my scalable – OceanBlue Jan 24 '17 at 16:47
Data is extremely huge (47 GB). that's why I want to divide them to variables. – Mohammad ElNesr Jan 25 '17 at 10:12

score -1 · Answer 4 · answered Jan 24 '17 at 16:28

-1

It would be ok to open >20 files in this way.

# your list of file names
file_names = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u'] 
fh = [] # list of file handlers
for idx,f in enumerate(files):
    fileName = f + '.txt'
    fh.append(open(fileName,'w'))

# do what you need here
print "done"

for f in fh:
    f.close()

though not sure if you really need to do so.

answered Jan 24 '17 at 16:28

JenkinsY

167
3
18

But OP wants to put them in a `with` clause so that files are closed automatically on exit... even in the case when one the files failed to open. Your solution doesn't have the functionality OP wants. – tdelaney Jan 24 '17 at 16:30

How to open more than 19 files in parallel (Python)?

4 Answers4

Linked