1

With respect to this amazing answers and this blogpost I still have a small question. Namely, the blog post states from the benchmarking that threading will be slower than without threading due to the GIL:

simple    threading   multiprocessing  
threads = 2       4.124      5.539       2.034 
threads = 3       6.391      13.772      3.376  
threads = 4       9.194      17.641      4.720  

So threading is even slower than simple execution. This is understood from the behaviour of GIL discussed above and should not surprise us now.

I benchmarked my own function (scrapping the data and writing it to file) in the same manner as in the post. And I have following results:

simple 15 mins, threading: 10 mins, multiprocessing 5 mins. 

So, why can threading be faster than simple method without any threading?

EDIT: Small Description of functions

for thread in range(4):
    process = multiprocessing .Process(name=str(thread), target=perform_extraction, args=(ranging[thread],))
   #process = Thread(name=str(thread), target=perform_extraction, args=(ranging[thread],))
    process.start()
    processes.append(process)

for process in processes:
    process.join()

def perform_extraction(ranges):
     thread_name = multiprocessing.current_process().name
     #thread_name = currentThread().getName()
     for page in ranges:
        data = extract_data(page)
        write_data(data, thread_name+'.txt')
Alina
  • 2,191
  • 3
  • 33
  • 68
  • Did you run the tests multiple times? How does your function look like? – CristiFati Nov 22 '17 at 14:08
  • Yes, several times. I have a function that extracts the data and writes it into the file. For each process/thread I have a specific name and a specific range of pages and each tread/process extracts only the specified pages and writes in the file with only corresponding name. So, 4 different threads/processes get 4 different page ranges and will write into 4 different files. – Alina Nov 22 '17 at 14:10
  • Try elliminating the _I/O_ (writing to the file part) and see what happens. – CristiFati Nov 22 '17 at 14:14
  • @CristiFati The time is almost the same, please check the edit in the question as I changed how the function looks like. – Alina Nov 22 '17 at 16:52

0 Answers0