Started from here: unique combinations of values in selected columns in pandas data frame and count
I have found the most to least occurring combinations of 3 columns with this code:
def common_cols(df,n):
'''n is how many of the top results to show'''
df = df.groupby(['A','B','C']).size().reset_index().rename(columns={0:'count'})
df = df.sort_values(by='count', ascending=False).reset_index(drop=True).head(n)
return df
common_data = common_cols(df,10)
Output of common_data(10 top results shown):
A B C count
0 0.00 0.00 0.00 96
1 0.00 1.00 0.00 25
2 0.14 0.86 0.00 19
3 0.13 0.87 0.00 17
4 0.00 0.72 0.28 17
5 0.00 0.89 0.11 16
6 0.01 0.84 0.15 16
7 0.03 0.97 0.00 15
8 0.35 0.65 0.00 15
9 0.13 0.79 0.08 14
Now, I would like to find combinations of the A B C rows, and count how many times they occurred.
For example lets say in the BASE df from row 1 to row 4:
the first set of combinations of the 3 columns (as told by dataframe(df) BEFORE using the common_cols function) are
# each of these rows are their own combination of values
A B C
0 0.67 0.16 0.17
1 0.06 0.73 0.20
2 0.19 0.48 0.33
3 0.07 0.87 0.06
4 0.07 0.60 0.33
The above 5 rows (in order) would be counted as a pattern of combinations. It could be counted as a combination of 2 rows, 3 rows, 4 rows or more rows (if it is easy enough to do that is!)
If this pattern was found once(across the entire dataframe), it would output this pattern's count as 1. If it was found 10 times; the count would be 10.
Any ideas on how I can count the combinations found between consecutive rows? Like from using the common_cols function, but as a 'combinations of combinations'?
The rows have to be in order for it to be a pattern. Any help is massively appreciated!