1

I have a dataframe with two categorical columns and a third with integers:

import pandas as pd

df1 = pd.DataFrame({
    'First': ['A','A','A','B','B','C'], 
    'Second': ['B','C','D','C','D','D'], 
    'Value': [1,2,3,4,5,6]}
)

df1

    First   Second  Value
0   A   B   1
1   A   C   2
2   A   D   3
3   B   D   4
4   B   D   5
5   C   D   6

I would like to get the corresponding triangular matrix, as (missing values can be NAs):

A B C D
  1 2 3 A
    4 5 B
      6 C

and finally, plot it in a triangular heatmap, which I believe I could be able to do with the help of this question, which, however, requires a numpy masked array as input. Other solutions to plot this without using numpy are also very welcome.

Any pythonic ideas of how to achieve this?

EDIT:

I realized the example I gave was too neat. My columns are not organized in the way they are above. So I have something like this:

df1 = pd.DataFrame({
'First': ['D','C','B','A','C','A','B','D','B','C'], 
'Second': ['E','E','C','D','D','E','E','B','A','A'], 
'Value': [1,2,3,4,5,6,7,8,9,10]}

)

      First Second  Value
0     D      E      1
1     C      E      2
2     B      C      3
3     A      D      4
4     C      D      5
5     A      E      6
6     B      E      7
7     D      B      8
8     B      A      9
9     C      A     10

and

df1.pivot('First','Second','Value')

produces

  Second    A    B      C      D      E
First                   
A          NaN  NaN     NaN     4.0     6.0
B          9.0  NaN     3.0     NaN     7.0
C         10.0  NaN     NaN     5.0     2.0
D          NaN  8.0     NaN     NaN     1.0

A non-triangular, non-simmetrical matrix. I need to have the same number of rows and columns and to push all these NaNs to the edges to create a triangle. Pivot doesn't seem to be a possible solution

EDIT2

The solution and desired output exists and is:

    A   B   C   D   E
A   NaN 9   10  4   6
B   NaN NaN 3   8   7
C   NaN NaN NaN 5   2
D   NaN NaN NaN NaN 1
E   NaN NaN NaN NaN NaN
JRCX
  • 249
  • 2
  • 14

1 Answers1

2

You can pivot and then pass DataFrame to your linked solution:

df = df1.pivot('First','Second','Value')
print (df)
Second    B    C    D
First                
A       1.0  2.0  3.0
B       NaN  4.0  5.0
C       NaN  NaN  6.0

from matplotlib import pyplot as PLT
from matplotlib import cm as CM


fig = PLT.figure()
ax1 = fig.add_subplot(111)
cmap = CM.get_cmap('jet', 10) # jet doesn't have white color
cmap.set_bad('w') # default value is 'k'
#passed DataFrame
ax1.imshow(df, interpolation="nearest", cmap=cmap)
ax1.grid(True)
PLT.show()

pic

EDIT: Solution is sorting columns First and Second per rows:

df1[['First','Second']] = np.sort(df1[['First','Second']], axis=1)
df = df1.pivot('First','Second','Value')
print (df)
Second    B     C    D    E
First                      
A       9.0  10.0  4.0  6.0
B       NaN   3.0  8.0  7.0
C       NaN   NaN  5.0  2.0
D       NaN   NaN  NaN  1.0

from matplotlib import pyplot as PLT
from matplotlib import cm as CM

#
fig = PLT.figure()
ax1 = fig.add_subplot(111)
cmap = CM.get_cmap('jet', 10) # jet doesn't have white color
cmap.set_bad('w') # default value is 'k'
#passed DataFrame
ax1.imshow(df, interpolation="nearest", cmap=cmap)
ax1.grid(True)
PLT.show()

pic2

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you. But I still have the problem of not having a triangular shape with a large dataframe/real data. The NaNs are spread all across the pivot. I tried sorting with df2 = df.reindex(df.sum().sort_values(ascending=False).index, axis=1) and repeat for columns but the NaNs are not pushed to the edges. I cannot use fillna with the pivot because the sorting will not produce the triangle – JRCX Aug 23 '18 at 13:52
  • @JRCX - Hmmm, I think problem is if missing some value like `df1 = pd.DataFrame({ 'First': ['A','A','B','B','C'], 'Second': ['B','C','C','D','D'], 'Value': [1,2,4,5,6]} )` pandas always create 2d matrix and added NaNs. I think it is normal for real data. – jezrael Aug 23 '18 at 13:55
  • 1
    Yes, and I want the NaNs to be there. They will allow for the half of the square to be blank. But in my example (now I'm realizing it was a bad one) the data is neat and sorted. In a real example, pivot scatters the NaNs in an unsorted, not-triangular result. – JRCX Aug 23 '18 at 14:01
  • 1
    @JRCX - Added solution for you, check edited answer. – jezrael Aug 23 '18 at 14:27
  • justify messes up the data. The image created is indeed a triangle, but the lines and columns of the array do not correspond anymore to the original columns and rows of the pivot dataframe. There is also no way to access these anymore to input them to the plot axis. – JRCX Aug 23 '18 at 15:15
  • The solution exists. It has to, as the input is all combinations possible between pairs of values. That translates into a Triangular matrix. But pivot cannot achieve it... Check Edit2 please. Thank you so much for all the help so far, sorry for the long thread, this should be simpler – JRCX Aug 23 '18 at 15:58
  • @JRCX - Yo uare rigtht, solution exist. Cehc edited answer. – jezrael Aug 23 '18 at 18:14