4

Hello I have an example data frame below. I am having trouble obtain the desired results through transpose....

x = ('P', 'P', 'O', 'DNP', 'D')
y = ('O', 'O', 'D', 'DNP', 'DNP')
z = ('P', 'P', 'O', 'U', 'DNP')
a = ('O', 'O', 'D', 'DNP', 'DNP')
b = ('P', 'DNP', 'O', 'U', 'DNP')
ID = ['ID1', 'ID2', 'ID3', 'ID4', 'ID5']
df = DataFrame(zip(ID, a, b, x, y, z), columns = ['id', 'a', 'b', 'x', 'y', 'z'])

    id    a    b    x    y    z
0  ID1    O    P    P    O    P
1  ID2    O  DNP    P    O    P
2  ID3    D    O    O    D    O
3  ID4  DNP    U  DNP  DNP    U
4  ID5  DNP  DNP    D  DNP  DNP

A simple df.transpose() produces...

0    1    2    3    4
id  ID1  ID2  ID3  ID4  ID5
a     O    O    D  DNP  DNP
b     P  DNP    O    U  DNP
x     P    P    O  DNP    D
y     O    O    D  DNP  DNP
z     P    P    O    U  DNP

The desired output is as follows....

   ID1    a    O
   ID1    b    P
   ID1    x    P
   ID1    y    O
   ID1    z    P
   ID2    a    O
   ID2    b    DNP
   ID2    x    P
   ID2    y    O
   ID2    z    P

and so on and so forth..... I appreciate any help!

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
boothtp
  • 311
  • 4
  • 14

1 Answers1

3

You could use pd.melt:

In [23]: pd.melt(df, id_vars=['id'], var_name='colvals', value_name='DOPU')
Out[23]: 
     id colvals DOPU
0   ID1       a    O
1   ID2       a    O
2   ID3       a    D
...
21  ID2       z    P
22  ID3       z    O
23  ID4       z    U
24  ID5       z  DNP

Or, alternatively, you could set id as the index before calling stack:

In [21]: df.set_index('id').stack()
Out[21]: 
id    
ID1  a      O
     b      P
     x      P
     y      O
     z      P
...         
ID5  a    DNP
     b    DNP
     x      D
     y    DNP
     z    DNP
dtype: object

stack moves the column level values into the index. Since the desired result has id values in the index as well, it is natural to use set_index to move the id column into the index first, and then to call stack.


Call reset_index to move the index levels into DataFrame columns:

In [164]: df.columns.name = 'colvals'
In [165]: df.set_index('id').stack().reset_index()
Out[165]: 
     id colvals    0
0   ID1       a    O
1   ID1       b    P
2   ID1       x    P
3   ID1       y    O
4   ID1       z    P
...
20  ID5       a  DNP
21  ID5       b  DNP
22  ID5       x    D
23  ID5       y  DNP
24  ID5       z  DNP
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thanks unutbu, A lot easier than I thought. Is there a way to not set the ID column as the index so the desired result will have the ID for every instance? I want to make the result into a new dataframe. Thanks! – boothtp Aug 10 '15 at 02:55
  • @boothtp: Use `reset_index` to move the index levels into (DataFrame) columns. – unutbu Aug 10 '15 at 03:12