1

Will there be a race condition in getUrl if I run this using threads? I am changing the value of data['key'] in multiple threads. I need to pass the entire data to the request I make, basically, a set of keys will be fixed, only the key named key will change for each threaded call I make

import requests
from concurrent.futures import ThreadPoolExecutor


def getUrl(url, value):
    data['key'] = value # will there be a race condition here
    return requests.get(url, data=data).text # and when the value is passed here

data = {'key': 1, 'fixedKey': 'fixedValue', 'fixedKey2': 'fixedValue2'}
resultArray = []
threadPool = ThreadPoolExecutor(32)

for i in range(100):
    resultArray.append(threadPool.submit(getUrl, 'https://google.com', i))

Thread Safety in Python's dictionary I checked this, but my confusion is will the thread switch context the moment I do a set in data['key'] = value and then some other thread changes that and the next line now has the new value set by a different thread.

Example

Value set by thread 1
data['key'] = 1
Context Switch
Value set by thread 2
data['key'] = 2
Context Switch back to old thread 1
is data['key'] = `2` now? I would necessarily want the value `1`

If I use locks then I will lose the concurrency here.

Jake
  • 245
  • 1
  • 3
  • 13

2 Answers2

2

There is a race condition because data is shared between threads and you mutate it from them.

An easy way to visualize this race condition is to mock the request call with a sleep and print:

from concurrent.futures import ThreadPoolExecutor
from time import sleep

def getUrl(url, value):
    data['key'] = value
    sleep(0.3)
    print(value, data["key"])

data = {'key': 1, 'fixedKey': 'fixedValue', 'fixedKey2': 'fixedValue2'}
resultArray = []
threadPool = ThreadPoolExecutor(32)

for i in range(100):
    resultArray.append(threadPool.submit(getUrl, 'https://google.com', i))

You'll see that the value passed to getUrl is sometimes different from the one stored in data.

A solution is to copy data locally before mutating it.

Louis Lac
  • 5,298
  • 1
  • 21
  • 36
  • 2
    thank you @louislac unfortunately I can not upvote your answer due to not having enough points, but I really did find your way of visualizing the race condition new and something to remember – Jake Sep 21 '21 at 08:02
1

Your current code has indeed a race condition because you are changing and using a same global variable in different threads. A simple way would be to use a local copy:

def getUrl(url, value):
    data2 = dict(data, 'key' = value)         # build a local instance
    return requests.get(url, data=data2).text # and use it: no race condition
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • thank you, so each threaded call will have its own copy of the function scope variables then? before I accept this nice suggestion, will creating a copy for each iteration be something that is memory intensive? – Jake Sep 21 '21 at 07:56
  • @Jake: Yes variables created before the thread starts are shared, and variables created in the thread are local to the thread (each thread has its own copy). Having one copy per thread does consume memory, but in one sense you are changing memory for speed, which a a common balance... – Serge Ballesta Sep 21 '21 at 07:59
  • do I need to use `with lock:data2 = dict(data, 'key' = value)` or no lock is needed for that? – Jake Sep 21 '21 at 08:14
  • 2
    No lock should be needed as the original/global dict is never modified. – Holloway Sep 21 '21 at 08:18