2

Suppose I have Panda DataFrame which looks following:

Input

Name    Key Val
David   A   1
Roe     B   2
John    A   3
Nat     B   4

I want to split by Key and Group by Name.

Output

Name    A   B
David   1   nan
John    3   nan
Nat     nan  4
Roe     nan  2

Can you please suggest a way to do it?

Below is code to generate the dataframe.

import pandas as pd
# Initializing the nested list with Data-set
data = [['David','A',1],
        ['Roe','B',2],
        ['John','A',3],
        ['Nat','B',4]]
df = pd.DataFrame(data, columns=['Name', 'Key','Val'])
Dibyendu Dey
  • 349
  • 2
  • 16

3 Answers3

1

Use reshaping like this, set_index with Name and Key creating a multiIndex, then unstack the inner most index level to create columns:

df.set_index(['Name','Key'])['Val'].unstack()

Output:

Key      A    B
Name           
David  1.0  NaN
John   3.0  NaN
Nat    NaN  4.0
Roe    NaN  2.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Great answer. Suppose have 4 columns in the initial table. 4th column say 'comment'. But I still want the split table to be df.set_index(['Name','Key'])['Val'].unstack(). I want to discard, 'comment' column. In that case, it will error out. – Dibyendu Dey Mar 06 '21 at 04:20
  • hmm..I think one way is cleanup those unnecessary columns and then use method you suggested. But not sure if there is way to apply the above method, without deleting columns which I don't need in split table. – Dibyendu Dey Mar 06 '21 at 04:26
  • 1
    If you have extra columns you can add them to the index to keep that information just after Name before key. – Scott Boston Mar 06 '21 at 15:03
  • 1
    Hi Scottt - Thanks for your help. I'm still facing some issue which I have put together below. Can you please put some comments when get sometime. Thanks, Dib – Dibyendu Dey Mar 09 '21 at 03:08
0

Example input data:

data = {"Id": ["DA0445EA", "DA0445EA", "DA0445EA", "DA0445EA", "DA0445EA"],
        "Port" : [1,1,1,1,1],
        "Lane" :[None,None,None,None,None],
        "Key" : ["start_now", "start","case","case_start","end"],
        #"val": [0.000001,0.2,0.3,0.4,0.5]
        "Val":[0.000001,0.2,0.3,0.4,0.5] #need to capitalize this 'val' to match dataframe definition below
        }
df = pd.DataFrame(data,columns=['Id', 'Port','Lane','Key','Val'])
df1 = df.set_index(['Id', 'Port','Lane','Key'])['Val'].unstack()
df1 = df1.reset_index()

Exapected output:

Id Port Lane start_now start case case_start end
DA0445EA 1 nan 0.000001 0.2 0.3 0.4 0.5

Issue I'm getting:

  1. all floating number are shown as NAN
  2. individual columns (Id, Port, Lane) aren't shown as columns
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
Dibyendu Dey
  • 349
  • 2
  • 16
  • 1
    I edited your solution. You need to capitalize 'val' to match you columns in your dataframe constructor this is why you were getting NaNs. – Scott Boston Mar 09 '21 at 14:05
  • so helpful Scott. I'm facing another issue (hopefully last one). Below is the scenario I give. – Dibyendu Dey Mar 09 '21 at 19:05
0
import pandas as pd
data = {"Id": ["DA0445EA", "DA0445EA", "DA0445EA", "DA0445EA", "DA0445EA"],
        "Port" : [1,1,1,1,1],
        "Lane" :[None,None,None,None,None],
        "Key" : ["start", "start","case","case_start","end"],
        "Val":[0.1,0.1,0.3,0.4,0.5] 
        }
df = pd.DataFrame(data,columns=['Id', 'Port','Lane','Key','Val'])
df1 = df.set_index(['Id', 'Port','Lane','Key'])['Val'].unstack()
df1 = df1.reset_index()

Below is Error generated. The reason is "There is two duplicate entry for same id". Such entry is unavoidable for datatable I'm working on. I was expecting, It will overwrite same line in table and won't throw any error like below.

raise ValueError('Index contains duplicate entries, '
ValueError: Index contains duplicate entries, cannot reshape

Is there a way to avoid such error during Re-shaping?

Dibyendu Dey
  • 349
  • 2
  • 16
  • ah.. now this duplicate entrie error leads to a a whole different solution is needed. You're going to need to groupby or pivot_table with some sort of aggregration. Take a look at this post(https://stackoverflow.com/a/47152692/6361531). – Scott Boston Mar 09 '21 at 19:29
  • Thanks for all your help. Finally I could make it work using pivot. – Dibyendu Dey Mar 09 '21 at 21:52