I have some code that makes an API call, formats the data, and appends it to a csv. Due to concerns about thread safety, I store all rows in a list before writing to the csv.
results = [] # list of lists, to be each list is a row for csv
with futures.ThreadPoolExecutor(max_workers=64) as executor:
for data in executor.map(get_data, data_units):
extract_data(data)
# write results to csv
def get_data(data_unit):
# makes api call to get data for data_unit
return data
def extract_data(data, results):
# turns data returned from api call into a list, and appends to results
row = formatted_data
results.append(row)
Is there a more canonical/faster way to do this? I have looked at the answer here Multiple threads writing to the same CSV in Python, and I don't want to put a lock in extract_data to write because it would slow down the API calls due to causing a bottleneck for the threads to write. For example is there another data structure I could use instead of the results list (something like a threadsafe stack) that I could pop stuff off to write to csv, while stuff keeps getting added to it?