csv writer - How to write rows into multiple files, based on a threshold?

Question

I want to write rows into csv file, but file should contain no-more than X rows. If threshold exceeded it needs to start a new file. So if I have the following data:

csv_max_rows=3
columns = ["A", "B", "C"]
rows = [
    ["a1", "b1", "c1"],
    ["a2", "b2", "c2"],
    ["a3", "b3", "c3"],
    ["a4", "b4", "c4"],
    ["a5", "b5", "c5"],
    ["a6", "b6", "c6"],
    ["a7", "b7", "c7"],
    ["a8", "b8", "c8"],
    ["a9", "b9", "c9"],
    ["a10", "b10", "c10"]
]

I want to end up with 4 files, where files 1,2,3 will have 3 rows each and file 4 will have only one row. Is there a built-in option to do that in Python csv writer?

user23952 · Answer 1 · 2021-12-29T09:33:58.747

1

I think your requirements are too specific to expect a built-in option in the standard library. The solution below is kinda hacky, but I think that's exactly what you want.

import csv

csv_max_rows = 3
columns = ["A", "B", "C"]
rows = [
    ["a1", "b1", "c1"],
    ["a2", "b2", "c2"],
    ["a3", "b3", "c3"],
    ["a4", "b4", "c4"],
    ["a5", "b5", "c5"],
    ["a6", "b6", "c6"],
    ["a7", "b7", "c7"],
    ["a8", "b8", "c8"],
    ["a9", "b9", "c9"],
    ["a10", "b10", "c10"],
]

for i, row in enumerate(rows):
    if (i % csv_max_rows) == 0:
        fp = open(f"out_{i//csv_max_rows+1}.csv", "w")
        writer = csv.writer(fp)
        writer.writerow(columns)
    writer.writerow(row)

edited Dec 29 '21 at 09:33

answered Dec 28 '21 at 21:36

user23952

578
3
10

Maybe `if i % csv_max_rows == 0:` to be more clear? Otherwise, good answer :) – Zach Young Dec 29 '21 at 00:59
1

Thanks @Zach Young, I agree that your suggestion is more clear. It appears that ```not (i % csv_max_rows):``` is a quirk I picked up. I'll edit the answer. – user23952 Dec 29 '21 at 09:33

Shoham · Answer 2 · 2021-12-29T12:49:09.060

Im not sure if there is a built-in option, but apparently this is not so complicated to achieve:

from typing import List
import csv
import concurrent


def chunks(lst: List, n: int):
    while lst:
        chunk = lst[0:n]
        lst = lst[n:]
        yield chunk


def write_csv(csv_file_path: str, columns: List[str], rows: List[List]):
    with open(csv_file_path, 'w') as csv_file:
        csv_writer = csv.writer(csv_file)
        csv_writer.writerow(columns)
        for row in rows:
            csv_writer.writerow(row)

def write_csv_parallel(base_csv_file_path: str, columns: List[str], rows: List[List], csv_max_rows: int) -> List[str]:
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        chunked_rows = chunks(rows, csv_max_rows)
        csv_writing_args = ((f"{base_csv_file_path}.{idx + 1}", columns, chunk_of_rows) for idx, chunk_of_rows
                            in enumerate(chunked_rows))
        executor.map(lambda f: write_csv(*f), csv_writing_args)


if __name__ == "__main__":
    columns = ["A", "B", "C"]
    rows = [
        ["a1", "b1", "c1"],
        ["a2", "b2", "c2"],
        ["a3", "b3", "c3"],
        ["a4", "b4", "c4"],
        ["a5", "b5", "c5"],
        ["a6", "b6", "c6"],
        ["a7", "b7", "c7"],
        ["a8", "b8", "c8"],
        ["a9", "b9", "c9"],
        ["a10", "b10", "c10"]
    ]
    base_csv_file_path = "/tmp/test_file.csv"
    csv_file_paths = write_csv_parallel(base_csv_file_path, columns, rows, csv_max_rows=3)
    print("data was written into the following files: \n" + "\n".join(csv_file_paths))

csv writer - How to write rows into multiple files, based on a threshold?

2 Answers2