I have some troubles related to calculation the number of occurrences of each url in my log file. I have one working variant but I'm sure that I can do it better:
import pandas as pd
import numpy as np
df = pd.DataFrame({'url_id' : [1,2,3,4,2,2,4], 'url' : ['microsoft.com', 'yandex.ru', 'google.com', 'bbc.com', 'yandex.ru', 'yandex.ru', 'bbc.com']})
df['dummy'] = 1
print(df.groupby(['url_id', 'url'])['dummy'].sum())
Output is:
url_id url
1 microsoft.com 1
2 yandex.ru 3
3 google.com 1
4 bbc.com 2
Name: dummy, dtype: int64