1

I have a dataframe df of the following type:

ID   Result Other_val
1    A      y
2    B      x
2    A      x
3    C      abc

After using pd.crosstab(df.ID, df.Result), I get a crosstab like this:

Result  A  B  C
ID
1       1  0  0
2       1  1  0
3       0  0  1

I wish to now concatenate these values to the end of my original dataframe (after removing the already present Result column) to get something like this:

ID   A   B   C   Other_val
1    1   0   0   y
2    1   1   0   x
3    0   0   1   abc

However, I am stumped. I can't seem to use pd.concat() to form the above table because of the strange way the cross-tab table is indexed.

Any help?

Cleb
  • 25,102
  • 20
  • 116
  • 151
mDe
  • 97
  • 1
  • 10

2 Answers2

3

I think what you're looking for is pd.crosstab([df.ID, df.Other_val], df.Result), because you need to group ID and Other_val.

In [5]: pd.crosstab([df.ID, df.Other_val], df.Result)
Out[5]: 
Result        A  B  C
ID Other_val         
1  y          1  0  0
2  x          1  1  0
3  abc        0  0  1
louis_guitton
  • 5,105
  • 1
  • 31
  • 33
  • 1
    That's far cleaner than mine and should probably the accepted answer (+ 1); did not even think about it but just blindly merged the dataframes... ;) You could still add a `.reset_index()`, if you like... – Cleb Dec 16 '17 at 01:30
  • 1
    That's why I at least upvoted your answer and also recommended to accept yours (see comment below my answer). – Cleb Dec 16 '17 at 01:34
  • I just came across this! Much cleaner and efficient- thank you – mDe Dec 16 '17 at 19:57
1

You can do

df2 = pd.crosstab(df.ID, df.Result)

and then either

df_final = df.drop('Result', axis=1).drop_duplicates('ID').join(df2, on='ID')

which gives

   ID Other_val  A  B  C
0   1         y  1  0  0
1   2         x  1  1  0
3   3       abc  0  0  1

If you want to reorder the columns you obtain your desired outcome

df_final = df_final[['ID', 'A', 'B', 'C', 'Other_val']]

   ID  A  B  C Other_val
0   1  1  0  0         y
1   2  1  1  0         x
3   3  0  0  1       abc

You can also use concat like this

pd.concat([df.drop('Result', axis=1).drop_duplicates('ID').set_index('ID'), df2], axis=1)

   Other_val  A  B  C
ID                   
1          y  1  0  0
2          x  1  1  0
3        abc  0  0  1
Cleb
  • 25,102
  • 20
  • 116
  • 151
  • @mDe: Glad it helped; you might want to look at laguittemh's answer though, which seems far cleaner (you can always unaccept my answer and accept his). I did not even think about this but just blindly merged the dataframes... :) – Cleb Dec 16 '17 at 01:32