0

I am trying to pivot a dataframe such that the unique values in an 'ID' column will be used for column labels and a multi-index will be created to organize the data into grouped rows. The second level of the multi-index, will be unique values obtained from the 'date' column and the first level of the multi-index will contain all other columns 'not considered' in the pivoting operation.

Here's the dataframe sample:

df = pd.DataFrame(
    data=[['A', '10/19/2020', 33, 0.2],
          ['A', '10/6/2020', 17, 0.6],
          ['A', '11/8/2020', 7, 0.3],
          ['A', '11/14/2020', 19, 0.2],
          ['B', '10/28/2020', 26, 0.6],
          ['B', '11/6/2020', 19, 0.3],
          ['B', '11/10/2020', 29, 0.1]],
    columns=['ID', 'Date', 'Temp', 'PPM'])

original df

    ID  Date        Temp    PPM
0   A   10/19/2020  33      0.2
1   A   10/6/2020   17      0.6
2   A   11/8/2020   7       0.3
3   A   11/14/2020  19      0.2
4   B   10/28/2020  26      0.6
5   B   11/6/2020   19      0.3
6   B   11/10/2020  29      0.1

desired output

      ID            A       B
      Date           
Temp  10/19/2020    33      NaN
      10/28/2020    NaN     26
      11/6/2020     17      19
      11/8/2020     7       NaN
      11/10/2020    NaN     29
      11/14/2020    19      NaN
PPM   10/19/2020    0.2     NaN
      10/28/2020    NaN     0.6
      11/6/2020     0.6     0.3
      11/8/2020     0.3     NaN
      11/10/2020    NaN     0.1
      11/14/2020    0.2     NaN

I took a look at this extensive answer for pivoting dataframes in pandas, but I am unable to see how it covers/apply it to, the specific case I am trying to implement.

EDIT: While I've provided dates as strings in the sample, these are actually datetime64 objects in the full dataframe I'm dealing with.

eNc
  • 1,021
  • 10
  • 23

1 Answers1

2

Let us try set_index and unstack

out = df.set_index(['ID','Date']).unstack().T
Out[27]: 
ID                  A     B
     Date                  
Temp 10/19/2020  33.0   NaN
     10/28/2020   NaN  26.0
     10/6/2020   17.0   NaN
     11/10/2020   NaN  29.0
     11/14/2020  19.0   NaN
     11/6/2020    NaN  19.0
     11/8/2020    7.0   NaN
PPM  10/19/2020   0.2   NaN
     10/28/2020   NaN   0.6
     10/6/2020    0.6   NaN
     11/10/2020   NaN   0.1
     11/14/2020   0.2   NaN
     11/6/2020    NaN   0.3
     11/8/2020    0.3   NaN
BENY
  • 317,841
  • 20
  • 164
  • 234
  • When I try this method, I get a ValueError exception: Index contains duplicate entries, cannot reshape. In my full dataframe, my date column actually contains date objects which may be the cause of this. – eNc Nov 15 '20 at 15:06
  • I threw a groupby + agg with 'last' in there to deal with duplicate dates. Thanks for your ans. – eNc Nov 15 '20 at 15:20