0

I have the following data frames:

import pandas as pd
df = pd.DataFrame( { 
    "Name" : ["gene1","gene2","gene3","gene4"] , 
    "T1" : [0.33,1,3,4],
    "T2" : [1.23,2.1,3.5,5.0], } )

Which looks like this:

In [30]: df
Out[30]:
    Name    T1    T2
0  gene1  0.33  1.23
1  gene2  1.00  2.10
2  gene3  3.00  3.50
3  gene4  4.00  5.00

What I want to do is group based on T1 and T2 (basically all columns from 2nd columns onward). Note that the column names follow no pattern and can be more than two.

The end result I hope to get is this:

T1
    Name    T1   
0  gene1  0.33  
1  gene2  1.00  
2  gene3  3.00  
3  gene4  4.00 

T2
    Name    T2
0  gene1    1.23
1  gene2    2.10
2  gene3    3.50
3  gene4    5.00

How can I achieve that?

I tried this but doesn't give what I want:

tlist = list(df.columns[1:])
for dft in df.groupby(tlist,axis=1):
    print df
neversaint
  • 60,904
  • 137
  • 310
  • 477
  • In your hoped result you are not grouping, you are just listing the columns. If that's really what you want, you just have to set "Name" as an index. `df = df.set_index("Name")` and then print the results `for c in df.columns: print(df[c])` – Geeklhem Mar 30 '15 at 03:35

1 Answers1

2

You can get there using pd.melt():

melted = pd.melt(df, id_vars='Name', var_name='t_col')

for t_col, sub_df in melted.groupby('t_col'):
    print(sub_df)

    Name t_col  value
0  gene1    T1   0.33
1  gene2    T1   1.00
2  gene3    T1   3.00
3  gene4    T1   4.00
    Name t_col  value
4  gene1    T2   1.23
5  gene2    T2   2.10
6  gene3    T2   3.50
7  gene4    T2   5.00
Marius
  • 58,213
  • 16
  • 107
  • 105