You can use the collections
module to make this very clean. This solution uses a defaultdict
to auto-create a new Counter whenever a new user is seen, then adds one to that counter for every redirect.
At the end of the "read from file" loop, we then have a data structure that looks like: {user : {(url1, url2): count}}
. This organization makes everything pretty easy to print in the second loop.
from collections import Counter, defaultdict
users_to_stats = defaultdict(Counter)
with open('tmp.txt') as fp:
for line in fp:
user, url1, url2 = line.split()
users_to_stats[user][(url1, url2)] += 1
for user, counts in users_to_stats.items():
print(user)
total_redirects_per_user = sum(counts.values())
for ((url1, url2), count) in counts.items():
print(f'{url1} -> {url2} : {count / total_redirects_per_user}')
Prints:
A
url1 -> url2 : 0.5
url1 -> url3 : 0.25
url2 -> url3 : 0.25
B
url1 -> url3 : 1.0