1

I have a dataframe that looks like this

Date     ID   Value
Mar-20   A     100
Mar-20   B     200
Mar-20   C     300
Apr-20   A     101
Apr-20   B     201
Apr-20   C     301

I'm trying to re-shape it so that it looks like this

        Value
ID       A      B      c
Date
Mar-20  100    200    300
Apr-20  101    201    301

I started off by trying

df = df.set_index(["Date", "ID"])

But unstacking this let to the following value error: "Index contains duplicate entries, cannot reshape" So I found this suggestion on SO, which allowed me to unstack

df = df.set_index(["Date", "ID"].append = True)

But when I then unstack by "ID" (ie df = df.unstack("ID") my end dataframe looks like this

         Value 
Date     ID     A    B    C
Mar-20   A     100  NaN  NaN
Mar-20   B     NaN  200  NaN
Mar-20   C     NaN  NaN  300
Apr-20   A     101  NaN  NaN 
Apr-20   B     NaN  201  NaN 
Apr-20   C     NaN  NaN  301

How can I compress this to get the output I'm after?

Thanks very much

harrison10001
  • 109
  • 1
  • 6

1 Answers1

2

Use DataFrame.unstack for expected output (order of index was changed):

df = df.set_index(["Date", "ID"]).unstack()
print (df)
       Value          
ID         A    B    C
Date                  
Apr-20   101  201  301
Mar-20   100  200  300

For correct order is possible add to_datetime:

df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')
df = df.set_index(["Date", "ID"]).unstack()
print (df)
           Value          
ID             A    B    C
Date                      
2020-03-01   100  200  300
2020-04-01   101  201  301

If need original format in correct order:

df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')
df = df.set_index(["Date", "ID"]).unstack().rename(lambda x: x.strftime('%b-%y'))
print (df)
       Value          
ID         A    B    C
Date                  
Mar-20   100  200  300
Apr-20   101  201  301

If there are only 3 columns is possible use DataFrame.pivot, but if more columns it failed, so rather not use it if general data:

df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')
df = df.pivot(*df.columns).rename(lambda x: x.strftime('%b-%y'))
print (df)
ID        A    B    C
Date                 
Mar-20  100  200  300
Apr-20  101  201  301

If any columns in input data and need pivoting only some 3 columns beter is use:

df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')
df = df.pivot('Date','ID','Value').rename(lambda x: x.strftime('%b-%y'))
print (df)
ID        A    B    C
Date                 
Mar-20  100  200  300
Apr-20  101  201  301

EDIT: If get error:

Index contains duplicate entries, cannot re-shape

it means there are duplicates per pairs Date, ID, so is necessary use aggregate function, like sum, mean in DataFrame.pivot_table:

print (df)
     Date ID  Value
0  Mar-20  A    100 <- same Date, ID
1  Mar-20  A    500 <- same Date, ID
2  Mar-20  B    200
3  Mar-20  C    300
4  Apr-20  A    101
5  Apr-20  B    201
6  Apr-20  C    301



df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')
df = df.pivot_table(index='Date',
                    columns='ID',
                    values='Value',
                    aggfunc='sum').rename(lambda x: x.strftime('%b-%y'))
print (df)
ID        A    B    C
Date                 
Mar-20  600  200  300 < aggregate sum 100+500=600
Apr-20  101  201  301

If need column Value in MultiIndex use:

df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')
df = df.pivot_table(index='Date',
                    columns='ID',
                    values=['Value'], 
                    aggfunc='sum').rename(lambda x: x.strftime('%b-%y'))
print (df)
       Value          
ID         A    B    C
Date                  
Mar-20   600  200  300
Apr-20   101  201  301

Better solution if need avoid sorting, thanks @anky:

df = df.groupby(["Date", "ID"],sort=False)['Value'].sum().unstack()
print (df)
ID        A    B    C
Date                 
Mar-20  600  200  300
Apr-20  101  201  301
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks so much for coming back to me. When I try the first version, I get the value error of "Index contains duplicate entries, cannot re-shape". When I add append = True to set_index, I then get loads of different versions of the index (1 for each ID) – harrison10001 Jun 03 '20 at 14:17
  • @harrison10001 - Check last paragraph for solution. – jezrael Jun 03 '20 at 14:21