Let's say I have a dataframe full of data, a column containing different urls and I want to scrape a price on the page of each url of the dataframe (which is pretty big, more than 15k lines). And I want this scraping to run continously (when it reaches the end of the urls, it starts over again and again). The last column of the dataframe (prices) would be updated everytime a price is scraped.
Here is a visual example of a toy dataframe :
Col 1 ... Col N URL Price
XXXX ... XXXXX http://www.some-website1.com/ 23,5$
XXXX ... XXXXX http://www.some-website2.com/ 233,5$
XXXX ... XXXXX http://www.some-website3.com/ 5$
XXXX ... XXXXX http://www.some-website4.com/ 2$
.
.
.
My question is : What is the most efficient way to scrape those URLs using a parallel method (multi-threading ...) knowing that I can implement the solution with request/selenium/bs4 ... (I can learn pretty much anything) So I would like a theoretical answer more than some lines of codes but if you have a block to send don't hesitate :)
Thank you