I am trying to iterate over a Pandas data frame with close to a million entries. I am using a for loop to iterate over them. Consider the following code as an example
import pandas as pd
import os
from requests_html import HTMLSession
from tqdm import tqdm
import time
df = pd.read_csv(os.getcwd()+'/test-urls.csv')
df = df.drop('Unnamed: 0', axis=1 )
new_df = pd.DataFrame(columns = ['pid', 'orig_url', 'hosted_url'])
refused_df = pd.DataFrame(columns = ['pid', 'refused_url'])
tic = time.time()
for idx, row in df.iterrows():
img_id = row['pid']
url = row['image_url']
#Let's do scrapping
session = HTMLSession()
r = session.get(url)
r.html.render(sleep=1, keep_page=True, scrolldown=1)
count = 0
link_vals = r.html.find('.zoomable')
if len(link_vals) != 0 :
attrs = link_vals[0].attrs
# print(attrs['src'])
embed_link = attrs['src']
else:
while count <=7:
link_vals = r.html.find('.zoomable')
count += 1
else:
print('Link refused connection for 7 tries. Adding URL to Refused URLs Data Frame')
ref_val = [img_id,URL]
len_ref = len(refused_df)
refused_df.loc[len_ref] = ref_val
print('Refused URL added')
continue
print('Got 1 link')
#Append scraped data to new_df
len_df = len(new_df)
append_value = [img_id,url, embed_link]
new_df.loc[len_df] = append_value
I wanted to know how could I use a progress bar to add a visual representation of how far along I am. I will appreciate any help. Please let me know if you need any clarification.