I have a dataset like this:
d = pd.DataFrame({
'users_list':[["us1", "us2", "us3", "us5", "us5"], ['us2', "us3", 'us2']],
'users_tuples': [[('us1', 'us2'), ('us2', 'us3'), ('us5', 'us1'), ('us5', 'us1')], [('us2', 'us3'), ('us3', 'us2')]]})
First I get a list of all users without repetition, like this:
all_users = sorted(list(set(sum([x for x in d['users_list']],[]))))
And then I have is this below:
for us in all_users:
d[us] = d.apply(lambda x : [1 if (a, us) in x['users_tuples'] else 0 for a in x['users_list']], 1)
But the answer I got is a list:
us1 us2 us3 us5
[0, 0, 0, 1, 1] [1, 0, 0, 0, 0] [0, 1, 0, 0, 0] [0, 0, 0, 0, 0]
[0, 0, 0] [0, 1, 0] [1, 0, 1] [0, 0, 0]
And I want the sum of each one of these, so it will be:
us1 us2 us3 us5
2 1 1 0
0 1 2 0
I know that to have this i can do:
for us in all_users:
d[us] = d.apply(lambda x : sum([1 if (a, us) in x['users_tuples'] else 0 for a in x['users_list']]), 1)
But I think is not efficient at all of these transformations and I was wondering if is there a more efficient way to do them.