I need to check at least 20k urls to check if the url is up and save some data in a database.
I already know how to check if an url is online and how to save some data in the database. But without concurrency it will take ages to check all urls so whats the fastest way to check thousands of urls?
I am following this tutorial: https://realpython.com/python-concurrency/ and it seems that the "CPU-Bound multiprocessing Version" is the fastest way to do, but I want to know if that it is fastest way or if there are better options.
Edit:
Based on the replies I will update the post comparing Multiprocessing and Multithreading
Example 1: Print "Hello!" 40 times
Threading
- With 1 thread: 20.152419090270996 seconds
- With 2 threads: 10.061403036117554 seconds
- With 4 threads: 5.040558815002441 seconds
- With 8 threads: 2.515489101409912 seconds
Multiprocessing with 8 cores:
- It took 3.1343798637390137 seconds
If you use 8 threads it will be better the threading
Example 2, the problem propounded in my question:
After several tests if you use more than 12 threads the threading will be faster. For example, if you want to test 40 urls and you use threading with 40 threads it will be 50% faster than multiprocessing with 8 cores
Thanks for your help