I wrote a simple function to process a list of links and get useful information out of them. Inside the function, I want to have a print
function to show me which element it is processing now. But in this case, the output is not what I expected. Here are my list and my code.
list = ['https://www.theguardian.com/world/2020/nov/18/test-and-trace',
'https://www.theguardian.com/world/2000/jan/27/3',
'https://www.theguardian.com/world/2020/nov/14/israeli-agents-in-iran-kill',
'https://www.theguardian.com/world/2020/nov/10/nagorno-karabakh-peace-deal',
'https://www.theguardian.com/world/2020/dec/06/professor-neil-ferguson',
'https://www.theguardian.com/world/2020/nov/15/south-australia-records-three',
'https://www.theguardian.com/world/2000/feb/28/gender.uk2']
and my code:
def tidy_links(links):
# make an empty dataframe for putting all links in a tidy manner
df = pd.DataFrame(columns=['cat', 'year', 'month', 'day', 'url', 'name'])
# loop over links
for i in range(len(links)):
print('Processing link number ', i, 'out of', len(links), end = '\r')
# add the data to the dataframe
s = links[i].split('/')
name = s[-5] + '_' + s[-4] + '_' + s[-3] + '_' + s[-2] + '_' + s[-1]
df.loc[len(df)] = [s[-5], s[-4], s[-3], s[-2], links[i], name]
return df
and this is the output:
df = tidy_links(links)
Processing link number 3060 out of 3061 0108 out of 3061 0238 out of 3061 0494 out of 3061 0802 out of 3061 1186 out of 3061 2265 out of 3061