2

In Pyhon i'm trying to code something to count instances of a given permutation in a dataset. Let me be more clear. Given the following dataframe

   WEB_ID          Category
   12332405        a
   3763583         b
   7930245         c
   7930245         a

I would like to have a Dataframe that looks like this.

s    t    q
a    b    0
a    c    1
b    a    0
b    c    0
c    a    1
c    b    0

This is read, there is 0 ids that has Category a and b, 1 with a and c, 0 with b and a... ans so on.

I'm as far as creating the permutations of all categories using Itertools module. I have some ideas but it all looks ugly and non performing.

I appreciate any help, if it's not clear let me know and i'll add details.

Thank you community!

1 Answers1

0

There are posts on ways to make the cartesian product more efficient, but the basic idea is:

  • merge on 'WEB_ID' to get all combinations of 'Category'. (This inherently gives the reciprocity of AB-BA matches you want to obtain)
  • groupby + size to count the occurrences.
  • .reindex to get the zeros.
  • remove things that merged with themselves.

Code:

import pandas as pd

res = (df.merge(df, on='WEB_ID')
         .groupby(['Category_x', 'Category_y']).size()
         .reindex(pd.MultiIndex.from_product([df.Category.unique()]*2, 
                                              names=['s', 't']))).fillna(0)

res = res[res.index.get_level_values(0) != res.index.get_level_values(1)]
res = res.to_frame('q')

Output: res

       q
s t     
a b  0.0
  c  1.0
b a  0.0
  c  0.0
c a  1.0
  b  0.0
ALollz
  • 57,915
  • 7
  • 66
  • 89