1

I have a dataframe:

df
ID 0   1   2    3  4 .... 
1  10  20  5    1  2 ....
2  3   4   NaN    10 1 ....

And I need to transpose the cell values of the column 0,1,2,3,4... to the column headers, and fill it for the Id's with 1 if the cell value is present for the respective ID.

Desired Output:

ID 1 2 3 4 5 ... 10 20 ..
1  1 1 0 0 1 ... 1  1  ..
2  1 0 1 1 0 ... 1  0  ..

Note that some entries can be NaN.

How can I get the desired output?

PV8
  • 5,799
  • 7
  • 43
  • 87

2 Answers2

3

Use DataFrame.set_index with DataFrame.stack for remove missing values, then create indicators by get_dummies and return 1/0 by max by first level, last convert columns to integers:

df1 = (pd.get_dummies(df.set_index('ID').stack())
         .max(level=0)
         .rename(columns=int)
         .reset_index())
print (df1)
   ID  1  2  3  4  5  10  20
0   1  1  1  0  0  1   1   1
1   2  1  0  1  1  0   1   0

EDIT:

print (df)
   ID   0   1    2   3  4  5
0   1  10  20  5.0   1  2  5
1   2   3   4  NaN  10  1  2

If use max then always in output are 0/1 values (check 5 column):

df1 = (pd.get_dummies(df.set_index('ID').stack())
         .max(level=0)
         .rename(columns=int)
         .reset_index())
print (df1)
   ID  1  2  3  4  5  10  20
0   1  1  1  0  0  1   1   1
1   2  1  1  1  1  0   1   0

But if use sum it count values (check 5 column):

df2 = (pd.get_dummies(df.set_index('ID').stack())
         .sum(level=0)
         .rename(columns=int)
         .reset_index())
print (df2)
   ID  1  2  3  4  5  10  20
0   1  1  1  0  0  2   1   1
1   2  1  1  1  1  0   1   0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Another way using melt and pd.crosstab

df1 = df.melt('ID')
df_final =  pd.crosstab(index=df1.ID, columns=df1.value).reset_index()

Out[673]:
value  ID  1.0  2.0  3.0  4.0  5.0  10.0  20.0
0      1   1    1    0    0    1    1     1
1      2   1    0    1    1    0    1     0

Note: default counting of pd.crosstab uses frequency. Therefore, duplicate values will count as their frequencies. If you want only 1/0 indicator, just chain ge(1) and astype as follows

pd.crosstab(index=df1.ID, columns=df1.value).ge(1).astype(int).reset_index()
Andy L.
  • 24,909
  • 4
  • 17
  • 29