1

I have a dataframe like this

Input

student_id  rep
abc100      1   
abc101      2
abc102      1
abc103      2
abc104      1
abc105      2
abc106      1
abc107      2

Expected output

1       2
abc100  abc101
abc102  abc103
abc104  abc105
abc106  abc107

I tried

df = df.pivot( columns='rep', values='student_id')

but it contains lot of nans and didnt give expected output.

I searched in stackoverflow but couldnt find an answer.

WonderWomen
  • 423
  • 1
  • 4
  • 7
  • 1
    as a general advice please provide sample in text, not images – Yuca Dec 17 '18 at 16:12
  • `df.reset_index().pivot('index','rep', 'student_id')` – cs95 Dec 17 '18 at 16:12
  • @coldspeed that doesn't yield the desired output but maybe that's because of my index assumptios – Yuca Dec 17 '18 at 16:16
  • @coldspeed your solution doesn't work and this is not a duplicate. I would suggest you reopen this question. – GeorgeOfTheRF Dec 17 '18 at 16:19
  • @Yuca Perhaps The index should be the result of groupby and cumcount... hmm, yeah that might work. – cs95 Dec 17 '18 at 16:19
  • @GeorgeOfTheRF Gladly... just need to wait for OP to replace their images with text ;-) – cs95 Dec 17 '18 at 16:19
  • @coldspeed that's effectively what I suggested. It's scary how some solutions match exactly how others think, makes me feel good hehe – Yuca Dec 17 '18 at 16:21

2 Answers2

4

To match the exact desired output you could do

df['aux'] = df.groupby('rep').cumcount()
df.pivot(index='aux' ,columns='rep', values='student_id')

Output:

rep       1       2
aux                
0    abc100  abc101
1    abc102  abc103
2    abc104  abc105
3    abc106  abc107
Yuca
  • 6,010
  • 3
  • 22
  • 42
  • 1
    Almost... I would've done `df.assign(index=df.groupby('rep').cumcount()).pivot('index', 'rep', 'student_id')` to avoid modifying the original, but this is effectively more efficient. +1 – cs95 Dec 17 '18 at 16:22
  • 1
    when people seem relatively new I prefer to provide the slow but readable solution. However, one liners are just too pretty and elegant – Yuca Dec 17 '18 at 16:23
0

You can choose df by slicing the column using iloc and a step arg:

>>> pd.DataFrame({'student_id':df['student_id'].iloc[::2].values, 'student_id_1':df['student_id'].iloc[1::2].values})
  student_id student_id_1
0     abc100       abc101
1     abc102       abc103
2     abc104       abc105
3     abc106       abc107

OR , another way around as @coldspeed suggested just for the wide visibility :-)

df.assign(index=df.groupby('rep').cumcount()).pivot('index', 'rep', 'student_id')
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53