Too many open output files while splitting a CSV

Question

Very novice attempt at python here.

I tried implementing something like was discussed in this question Splitting csv file based on a particular column using Python

My goal is to take a file with 15 million lines of 500 ticker symbols and put each ticker in their own file.

However, when I'm running it, I'm getting

OSError: [Errno 24] Too many open files: 'APH.csv'

All of the lines of data are in order (ie all of the lines of data for ticker "A" are one right after another, so I could close a file before going on to the next one). I'm not sure where in this code I would close the file before going on to the next one. FYI - this is on a Mac if that matters.

My code is

import csv

with open('WIKI_PRICES_big.csv') as fin:    
    csvin = csv.DictReader(fin)
    # Category -> open file lookup
    outputs = {}
    for row in csvin:
        cat = row['ticker']
        # Open a new file and write the header
        if cat not in outputs:
            fout = open('{}.csv'.format(cat), 'w')
            dw = csv.DictWriter(fout, fieldnames=csvin.fieldnames)
            dw.writeheader()
            outputs[cat] = fout, dw
        # Always write the row
        outputs[cat][1].writerow(row)
    # Close all the files
    for fout, _ in outputs.values():
        fout.close()

Are you open to using Pandas? – GollyJer Feb 17 '19 at 01:04 — GollyJer, Feb 17 '19 at 01:04

score 0 · Answer 1 · answered Feb 17 '19 at 01:05

Based on the file structure you describe, the following should do it.

The trick is that if the ticker values are always in order, you only need to keep open a single file output file at any one time. You can then close the old one and reopen the new one when you come across a new ticker value.

import csv

fout = False
with open('WIKI_PRICES_big.csv') as fin:    
    csvin = csv.DictReader(fin)
    seen = []

    for row in csvin:
        cat = row['ticker']

        # Open a new file and write the header.
        if cat not in seen:
            seen.append(cat)

            if fout:  # Close old file if we have one.
                fout.close()

            fout = open('{}.csv'.format(cat), 'w')
            dw = csv.DictWriter(fout, fieldnames=csvin.fieldnames)
            dw.writeheader()

        # Always write the row
        dw.writerow(row)

    fout.close()

This is still giving a too many open files. I'm going to take a different approach tomorrow... Thank you — Mary, Feb 17 '19 at 03:31
@Mary are you sure you copied and ran this code correctly? I've tested this here (also on a Mac) up to 10,000 output files without issue. — mfitzp, Feb 17 '19 at 09:30

Too many open output files while splitting a CSV

1 Answers1