Pandas: Select rows and add on the right as multiple sets of columns (reshape?)

Question

df = pd.DataFrame({
    'pid': [10,20,10,30], 
    'sid': [1,1,2,3],
    'data1': ['a','b','a','c'],
    'data2': ['q','w','e','e'],
})

   pid  sid data1 data2
0   10    1     a     q
1   20    1     b     w
2   10    2     a     e
3   30    3     c     e

Can be many rows data. Perhaps, can use index = ['pid', 'sid'].

How I can transform it to Excel in the following format?

That is, index by pid, select row blocks by the 'sid', with iteration by incrementation of it. Then add these as blocks of columns to the right.

sid   1             2              3
pid   data1  data2  data1  data2   data1  data2
 10   a      q      a      e       None   None
 20   b      w      None   None    None   None
 30   None   None   None   None    c      e

Thank you.

Does this answer your question? [How to pivot a dataframe](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) — Ben.T, Apr 23 '20 at 17:29
Thanks you for the common help link. But unfortunately, it's no solution there. Testing all the options will take a week, and I doubt that I will find a solution. In this case, there may be a mass of columns like `dataN`, and a mass of `pid` and `sid`. While, `pivot()` requires a specific one column with new columns lists, and doesn't control subdivides by multiple columns. — bl79, Apr 23 '20 at 17:59
as you can see from the answer of @r.ook, doing a pivot `df.pivot(index='pid',columns='sid')` and even not specifying the values will work with any number of column data really. the two other steps are mostly "cosmetic" while necessary I understand that :) — Ben.T, Apr 23 '20 at 18:13

r.ook · Accepted Answer · 2020-04-23T18:21:41.197

There's a very good link in the comment. However to answer your specific question, here's a way to achieve it:

>>> df = df.pivot(index='pid',columns='sid',values=['data1','data2'])
>>> df
    data1           data2          
sid     1    2    3     1    2    3
pid                                
10      a    a  NaN     q    e  NaN
20      b  NaN  NaN     w  NaN  NaN
30    NaN  NaN    c   NaN  NaN    e

This gives you the proper pivot, but not necessarily in the order you want it. So we'll need to swap the level on the columns:

>>> df.columns = df.columns.swaplevel()
>>> df
sid     1     2     3     1     2     3
    data1 data1 data1 data2 data2 data2
pid                                    
10      a     a   NaN     q     e   NaN
20      b   NaN   NaN     w   NaN   NaN
30    NaN   NaN     c   NaN   NaN     e

And then sort it:

>>> df = df.sort_index(axis=1)
>>> df
sid     1           2           3      
    data1 data2 data1 data2 data1 data2
pid                                    
10      a     q     a     e   NaN   NaN
20      b     w   NaN   NaN   NaN   NaN
30    NaN   NaN   NaN   NaN     c     e

Or you could just chain the methods together:

>>> df = df.pivot(index='pid',columns='sid',values=['data1','data2']).swaplevel(axis=1).sort_index(axis=1)
>>> df 
sid     1           2           3      
    data1 data2 data1 data2 data1 data2
pid                                    
10      a     q     a     e   NaN   NaN
20      b     w   NaN   NaN   NaN   NaN
30    NaN   NaN   NaN   NaN     c     e

you could chain everything because `swaplevel` can be applied to a dataframe directly and take axis=1 as argument but it would not make fewer steps ^^ — Ben.T, Apr 23 '20 at 18:20
Thanks @Ben.T, I was *just* updating my answer lol. These types of shortcuts are not always immediate obvious to me, so appreciate the feedback. — r.ook, Apr 23 '20 at 18:23

Pandas: Select rows and add on the right as multiple sets of columns (reshape?)

1 Answers1