-1

I am using the following code to expand the shortened URLs in a column of a very large dataframe. It is extremely slow. How can I do the same thing in a more efficient way?

Thanks!

Here is my code :

from __future__ import absolute_import
import requests

trib=df[df['url'].str.contains('https://trib.al/',na=False)]
expand=trib['url'].tolist()

for trib in expand:
    r = requests.get(trib)
    df['url']=df['url'].str.replace(trib, r.url, regex=False)
Mia
  • 407
  • 1
  • 4
  • 11

1 Answers1

3

Looking at your code, I don't think that there is much leeway for improvement. You could encapsulate the for-loop into a function and use pd.Series.apply for further speed improvements inside your code, but I would guess that the slow part is the request as it needs to go through the internet. You could parallelize the requests using multiprocessing and thereby finish your list faster. But you would have to be careful here about not sending to many requests in parallel since you might get blocked if you cause excessive server load.

C Hecht
  • 932
  • 5
  • 14