1

I'm using this code just as a example to make it simple:

import pandas as pd

data = {'x': [1, 2, 3, 1, 2, 1], 'y': [1, 2, 3, 1, 2, 2]}  

# create DataFrame  
df = pd.DataFrame(data)

#count how many times each tuple (x,y) happens and pu the value in n
occurrence= df.groupby(['x', 'y']).size().sort_values(ascending=False)

occurrence_df=occurrence.to_frame() 

occurrence_df.reset_index(inplace=True) 

occurrence_df.columns = [ 'x','y','n'] #name the columns

I trying to make a matrix like this one: example of more or less how will be

Axis X: each different values of x. Axis Y: each different values of Y, and each cell with the amount of times that tuple happened

I think it's a heatmap that resembles the most what I want.

I'm bashing my head against it for a few days by now, I'm really thankful for the help.

Jacquefr
  • 13
  • 4
  • If you actually wanted missing values (`NaN`), I'd recommend unstacking like [this answer](https://stackoverflow.com/a/39132900/15497888) `occurrence_df = df.groupby(['x', 'y']).size().unstack()`. [This answer](https://stackoverflow.com/a/29877565/15497888) goes through a plotting option if you need an actual heatmap. – Henry Ecker Mar 19 '22 at 20:10

1 Answers1

0

What you want is called a confusion matrix.

import pandas as pd

data = {'x': [1, 2, 3, 1, 2, 1], 'y': [1, 2, 3, 1, 2, 2]}

# create DataFrame
df = pd.DataFrame(data)

df_confusion_matrix = pd.crosstab(df["y"], df["x"])
print(df_confusion_matrix)

Expected result

x  1  2  3
y         
1  2  0  0
2  1  2  0
3  0  0  1

Which you can read like this (which is exactly the format you have provided):

          x
       1  2  3
      ---------- 
    1 | 2  0  0
y   2 | 1  2  0
    3 | 0  0  1

There are many other ways as well. You might wanna have a look at this thread.

Mushroomator
  • 6,516
  • 1
  • 10
  • 27