0

I've got an application which gets some results from some urls and then has to take a decision based on the results (i.e.: pick the best result and display it to the user). Since I want to check several urls this was the first time that multithreading is pretty much needed.

So with the help of some examples I cooked up the following testcode:

import threading
import urllib2

threadsList = []
theResultList = []

def get_url(url):
    result = urllib2.urlopen(url).read()
    theResultList.append(result[0:10])

theUrls = ['http://google.com', ' http://yahoo.com']

for u in theUrls:
    t = threading.Thread(target=get_url, args=(u,))
    threadsList.append(t)
    t.start()
    t.join()

print theResultList

This seems to work, but I'm really insecure here because I really have virtually no experience with multithreading. I always hear these terms like "thread safe" and "race condition".

Of course I read about these things, but since this is my first time using something like this, my question is: is it ok to do it like this? Are there any negative or unexpected effects which I overlook? Are there ways to improve this?

All tips are welcome!

kramer65
  • 50,427
  • 120
  • 308
  • 488
  • As a side note, your code here is probably not doing what you think it is doing. The t.join() call at the end of the for loop will force the main thread to wait until that child thread finishes--*before* it continues the loop and starts the next thread. What you probably want to do is create an empty list of threads prior to the loop, then in the loop add each thread object to that list. Only *after* you have started each thread, then you should loop over the list and join to each of them. – GrandOpener Dec 08 '14 at 22:01

1 Answers1

7

You have to worry about race conditions when you have multiple threads modifying the same object. In your case you have this exact condition - all threads are modifying theResultList.

However, Python's lists are thread safe - read more here. Therefore appends to a list from multiple threads will not somehow corrupt the list structure - you still have to take care to protect concurrent modifications to individual list elements however. For example:

# not thread safe code! - all threads modifying the same element
def get_url(url):
    result = urllib2.urlopen(url).read()

    #in this example, theResultList is a list of integers
    theResultList[0] += 1

In your case, you aren't doing something like this, so your code is fine.

Side note: The reason incrementing an integer isn't thread safe, is because it's actually two operations - one operation to read the value, and one operation to increment the value. A thread can be interrupted between these two steps (by another thread that also wants to increment the same variable) - this means that when the thread finally does increment in the second step, it could be incrementing an out of date value.

Community
  • 1
  • 1
Martin Konecny
  • 57,827
  • 19
  • 139
  • 159