0

I would like to reverse the get_dummies encoding, but with multiple sub categories ("A","B" in this example):

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],'C': [1, 2, 3]})

   A  B  C
0  a  b  1
1  b  a  2
2  a  c  3

df = pd.get_dummies(df)

   C  A_a  A_b  B_a  B_b  B_c
0  1    1    0    0    1    0
1  2    0    1    1    0    0
2  3    1    0    0    0    1

Now when inverting the "dummy" df the result should separate the two categories "A" and "B" without stacking to one category like:

   C Col2
0  1  B_b
1  2  A_b
2  3  B_c

I've tried:

df[df==1].stack().reset_index().drop(0,1)

   level_0 level_1
0        0       C
1        0     A_a
2        0     B_b
3        1     A_b
4        1     B_a
5        2     A_a
6        2     B_c
df.idxmax(axis=1)

0    C
1    C
2    C
dtype: object
v = np.argwhere(df.drop('C', 1).values).T
t=pd.DataFrame({'C' : df.loc[v[0], 'C'], 'Col2' : df.columns[1:][v[1]]})

t
   C Col2
0  1  B_b
1  2  A_b
2  3  B_c
df2 = df.select_dtypes(include = ['object'])
df2[df2.columns].apply(lambda x:x.astype('category'))

the reverse should give the original again:

   A  B  C
0  a  b  1
1  b  a  2
2  a  c  3

Thank you for your help in advance!

Marius
  • 1
  • 1
  • kindly put the full output of ur invert from the dummy. also, what have u tried? where did u get stuck? – sammywemmy Apr 30 '20 at 11:06
  • I've updated the question with the methods I've tried. I got stuck reverting without splitting the column names by the separator "_" in a loop – Marius Apr 30 '20 at 11:23

0 Answers0