Given a Pandas Dataframe as follows:
value_1 value_2
0 b a
1 a b
2 c d
3 d c
I'd like to count how many times each pair ocurred, despite order. In other words a,b is equal to b,a. I found some cases, like that, but in this case some values are expected to appear in only one column.
First, I try something like:
pd.crosstab(df.value_1,df.value_2).replace(0,np.nan).\
stack().reset_index().rename(columns={0:'Frequency'})
Through this method I also could not remove the duplicates taking into account the order (a,b == b, a)
Then I assume the case of a list like [(a,b), (b,a), (c,d), (d,c)]
, so I created a function to return only the single pairs according to their elements:
from itertools import combinations
def get_unique_pairs(pairs):
unique_pairs = []
for pair in combinations(pairs, 2):
if Counter(pair[0]) == Counter(pair[1]):
unordered_pair = set(list(pair[0]+pair[1]))
unique_pairs.append(unordered_pair)
return {(tuple(e)) for e in unique_pairs}
Their output for [(a,b), (b,a), (c,d), (d,c)]
is {(a,b), (c,d)}
But, it is not viable for a big dataframe.
finally, for the dataframe described above, the expected output is:
pair frequency
0 b; a 2
1 c; d 2