Is there a risk of loosing data when multiple threads are extending values to the same list obejct? [Python]

Question

In my solution I have a list object called output_list. I am parsing product structure tree data from an API, due to the API calls being I/O time consuming, I am using concurrent.futures to speed up the process.

output_list = []
input_list = [...] # List of products to fetch data for.
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
            result_future = {executor.submit(breakdown,product,output_list,log_file): product for product in input_list}
            for future in concurrent.futures.as_completed(result_future ):
                try:
                    dummy = future.result()
                except Exception as e:
                    log_file.write(traceback.format_exc())
                    raise e
            list_to_json_blob(output_list) #Function to transform output_list to a json blob.


def breakdown(product,output_list,log_file):
    xml_data = api_function(product) #Function that fetches product structure data, one level down
    output_list.extend([product]) #Extend the output list 
    sub_products = find_subproducts(xml_data) #Return sub products, returns empty list if reached bottom of tree.
    for sub_product in sub_products:
        breakdown(sub_product,output_list,log_file):

Thus I will have multiple threads extending the same list object in the recursive function. Is there any risk involved in doing so? If so, what would be the best practice to achieve the same purpose?

score 1 · Accepted Answer · answered Apr 17 '20 at 12:34

Lists themselves and therefore the extend method are thread safe (see also this question).

The order of the items however might be not what you expect. When running this code singlethreaded, all subproducts will follow directly after the main product.

With multithreading the products/subproducts of the individual threads will be interleaved.

You might be better of creating one list per thread and join them together once all threads are finished, if you want to maintain the order of the elements.

Is there a risk of loosing data when multiple threads are extending values to the same list obejct? [Python]

1 Answers1