-1

I have a pandas DataFrame, with: '''

1st column = subject_id,
2nd column = voxel_type (categorical; T, P, or C),
3rd to 10th columns = floats

''' Since I have multiple rows for each subject, I want to collapse them all into one row, to leave only one row per subject. My first thought was to use groupby, as other questions on SO suggest. However, I have different row counts per subject... how do I approach this?

Thank you so much!

  • 1
    Please include [`Minimal, Reproducible Example`](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – sushanth Jul 28 '20 at 05:45

1 Answers1

0

Goo day,

It's a bit hard to see what you want, but here is something to that direction. What you most likely are looking for is pivot or pivot_table. Here is an example where I did a with my best guess:

import pandas as pd
import numpy as np

Lets generate some dummy data:

begin = pd.DataFrame({'id': [1,1,1,1,2,2,3,3,3], 
                      'tp': ['t','p','c']*3, 
                     'foo': [np.random.random()+1 for x  in range(9) ], 
                     'bar': [np.random.random()+2 for x  in range(9) ]  
                    })

Assumption: tp is not uniform and single id might have multiple tp-entries. Then we just pivot the table and list the columns which we want to have:

pd.pivot_table(begin, values=['foo', 'bar'], columns=['id', 'tp'])

if assumption prior is wrong, you can just do:

pd.pivot_table(begin, values=['tp,'foo', 'bar'], columns='id')

For more see documentation on Pivot

pinegulf
  • 1,334
  • 13
  • 32