0

I have some code that makes an API call, formats the data, and appends it to a csv. Due to concerns about thread safety, I store all rows in a list before writing to the csv.

results = [] # list of lists, to be each list is a row for csv
with futures.ThreadPoolExecutor(max_workers=64) as executor:
    for data in executor.map(get_data, data_units):
        extract_data(data)
# write results to csv

def get_data(data_unit):
     # makes api call to get data for data_unit
     return data


def extract_data(data, results):
    # turns data returned from api call into a list, and appends to results
    row = formatted_data
    results.append(row)

Is there a more canonical/faster way to do this? I have looked at the answer here Multiple threads writing to the same CSV in Python, and I don't want to put a lock in extract_data to write because it would slow down the API calls due to causing a bottleneck for the threads to write. For example is there another data structure I could use instead of the results list (something like a threadsafe stack) that I could pop stuff off to write to csv, while stuff keeps getting added to it?

Community
  • 1
  • 1
Andrew
  • 6,295
  • 11
  • 56
  • 95

1 Answers1

0

No matter which structure you will use to replace your list, it will mandatory use locks internally. You can use a queue for example, which is thread-safe, but it uses a lock internally.

Thibaut D.
  • 2,521
  • 5
  • 22
  • 33