Appending a csv/list while multithreading in Python

Question

I have some code that makes an API call, formats the data, and appends it to a csv. Due to concerns about thread safety, I store all rows in a list before writing to the csv.

results = [] # list of lists, to be each list is a row for csv
with futures.ThreadPoolExecutor(max_workers=64) as executor:
    for data in executor.map(get_data, data_units):
        extract_data(data)
# write results to csv

def get_data(data_unit):
     # makes api call to get data for data_unit
     return data


def extract_data(data, results):
    # turns data returned from api call into a list, and appends to results
    row = formatted_data
    results.append(row)

Is there a more canonical/faster way to do this? I have looked at the answer here Multiple threads writing to the same CSV in Python, and I don't want to put a lock in extract_data to write because it would slow down the API calls due to causing a bottleneck for the threads to write. For example is there another data structure I could use instead of the results list (something like a threadsafe stack) that I could pop stuff off to write to csv, while stuff keeps getting added to it?

Use a queue? https://docs.python.org/2/library/queue.html#module-Queue — Alastair McCormack, Mar 18 '16 at 18:49

score 0 · Answer 1 · answered Mar 18 '16 at 18:48

0

No matter which structure you will use to replace your list, it will mandatory use locks internally. You can use a queue for example, which is thread-safe, but it uses a lock internally.

answered Mar 18 '16 at 18:48

Thibaut D.

2,521
5
22
33

Appending a csv/list while multithreading in Python

1 Answers1