0

Input

I have the following file input.txt:

D E F G H
a 1 b 1 4
a 1 c 1 5
b 2 c 2 6

Desired output

How can I create a new data frame, that uses columns D and E as an index? I want a triangular matrix that looks something like this:

   a1 b1 c1 b2 c2
a1  0  4  5  0  0
b1     0  0  0  0
c1        0  0  0
b2           0  6
c2              0

1st attempt

I am importing the data frame and I am trying to do a pivot like this:

import pandas as pd
df1 = pd.read_csv(
    'input.txt', index_col=[0,1], delim_whitespace=True,
    usecols=['D','E','F','G','H'])
df2 = df1.pivot(index=['D', 'E'], columns=['F','G'], values='H')

df1 looks like this:

     F  G  H
D E         
a 1  b  1  4
  1  c  1  5
b 2  c  2  6

df1.index looks like this:

MultiIndex(levels=[['a', 'b'], [1, 2]],
           labels=[[0, 0, 1], [0, 0, 1]],
           names=['D', 'E'])

df2 fails to be generated and I get this error message:

`KeyError: "['D' 'E'] not in index"`

2nd attempt

I thought I had solved it like this:

import pandas as pd
df = pd.read_csv(
    'input.txt', delim_whitespace=True,
    usecols=['D','E','F','G','H'],
    dtype={'D':str, 'E':str, 'F':str, 'G':str, 'H':float},
    )
pivot = pd.pivot_table(df, values='H', index=['D','E'], columns=['F','G'])

pivot looks like this:

F     b   c    
G     1   1   2
D E            
a 1   4   5 NaN
b 2 NaN NaN   6

But when I try to do this to convert it to a symmetric matrix:

pivot.add(df.T, fill_value=0).fillna(0)

Then I get this error:

ValueError: cannot join with no level specified and no overlapping names

3rd attempt and solution

I found a solution here. It is also what @Moritz suggested, but I'm new to pandas and didn't understand his comment. I did this:

import pandas as pd
df1 = pd.read_csv(
    'input.txt', index_col=[0,1], delim_whitespace=True,
    usecols=['D','E','F','G','H'],
    dtype={'D':str, 'E':str, 'F':str, 'G':str, 'H':float}
    )
df1['DE'] = df1['D']+df1['E']
df1['FG'] = df1['F']+df1['G']
df2 = df1.pivot(index='DE', columns='FG', values='H')

This data frame is generated:

FG  b1  c1  c2
DE            
a1   4   5 NaN
b2 NaN NaN   6

Afterwards I do df3 = df2.add(df2.T, fill_value=0).fillna(0) to convert the triangular matrix to a symmetric matrix. Is generating new columns really the easiest way to accomplish what I want? My reason for doing all of this is that I want to generate a heat map with matplotlib and hence need the data to be in matrix form. The final matrix/dataframe looks like this:

    a1  b1  b2  c1  c2
a1   0   4   0   5   0
b1   4   0   0   0   0
b2   0   0   0   0   6
c1   5   0   0   0   0
c2   0   0   6   0   0
Community
  • 1
  • 1
tommy.carstensen
  • 8,962
  • 15
  • 65
  • 108
  • 1
    What if you try something like df ['D'].map(str) +df ['E'].map(str) etc. ? – Moritz May 12 '15 at 23:20
  • Please post the solution and what you are working on – Moritz May 12 '15 at 23:20
  • Whoa, lot of stuff here now. Would be helpful to remove the solved stuff and just leave the parts relevant to the current issue you are trying to solve. Otherwise folks are going to waste time reading irrelevant aspects and might not even make it all the way to the the actual question... – JohnE May 12 '15 at 23:58
  • @JohnE actually on second thought I think it will be beneficial for others to see all the mistakes I made? – tommy.carstensen May 13 '15 at 00:01
  • @Moritz I realise now that you gave me the answer. I'm new to Pandas, so I didn't understand your comment. Can you leave an answer? I posted the solution under `EDIT2`. – tommy.carstensen May 13 '15 at 00:07
  • I am too lazy right now. I will do that later on. Currently I do only have my tablet. – Moritz May 13 '15 at 11:12

0 Answers0