2

I have some troubles related to calculation the number of occurrences of each url in my log file. I have one working variant but I'm sure that I can do it better:

import pandas as pd
import numpy as np

df = pd.DataFrame({'url_id' : [1,2,3,4,2,2,4], 'url' : ['microsoft.com', 'yandex.ru', 'google.com', 'bbc.com', 'yandex.ru', 'yandex.ru', 'bbc.com']})
df['dummy'] = 1
print(df.groupby(['url_id', 'url'])['dummy'].sum())

Output is:

url_id  url          
1       microsoft.com    1
2       yandex.ru        3
3       google.com       1
4       bbc.com          2

Name: dummy, dtype: int64

Roman Kazmin
  • 931
  • 6
  • 18

0 Answers0