0

I have a question - how to insert multiple (for example 3) columns to the DataFrame on the location of an existing column? In other words, I had a column with some categorical values which I encode with one-hot encoding - as a result, I obtained 3 new columns. Now, I want to drop the original column and insert resulting columns on its location (and not to the end of data frame). Any ideas of how to do it efficiently? I'll appreciate any help.

**df1 - Original datafarme** :

   col1 col2  col3
0   4    A    0.5
1   5    B    0.78
2   6    C    0.55
3   7    A    0.78

**df2 - created one-hot encoding of categorical col2** :

   col2_A col2_B  col2_C
0   1       0       0
1   0       1       0
2   0       0       1
3   1       0       0

How to insert columns of df2 to df1, instead of col2 to obtain:

**Updated df1**

   col1 col2_A col2_b col2_C  col3
0   4    1      0        0    0.5
1   5    0      1        0    0.78
2   6    0      0        1    0.55
3   7    1      0        0    0.78
PasDeSence
  • 39
  • 4

3 Answers3

1

use

df_concat = pd.concat([df1, df2], axis=1)

then to drop that col_2 use

df_concat.drop(['col_2'], axis = 1)
Andy_101
  • 1,246
  • 10
  • 20
0

Solution with any new columns in df2 (not necessary starts with col2)

Use Index.get_loc for positions, so possible filter by positions with DataFrame.iloc values before and after column and then join together in concat, last if necessary remove column:

val = 'col2'
p = df.columns.get_loc(val)

#possible solution for dummies, be free use your solution
#df2 = pd.get_dummies(df[val])
df = pd.concat([df.iloc[:, :p], df2, df.iloc[:, p:]], axis=1).drop(val, axis=1)
print (df)

   col1  A  B  C  col3
0     4  1  0  0  0.50
1     5  0  1  0  0.78
2     6  0  0  1  0.55
3     7  1  0  0  0.78

If prefixes are necessary:

val = 'col2'
p = df.columns.get_loc(val)
#possible solution for dummies, be free use your solution
#df2 = pd.get_dummies(df[[val]])
df = pd.concat([df.iloc[:, :p], df2, df.iloc[:, p:]], axis=1).drop(val, axis=1)
print (df)

   col1  col2_A  col2_B  col2_C  col3
0     4       1       0       0  0.50
1     5       0       1       0  0.78
2     6       0       0       1  0.55
3     7       1       0       0  0.78

Or use DataFrame.pop in get_dummies or another solution:

val = 'col2'
p = df.columns.get_loc(val)
#possible solution for dummies, be free use your solution
#df2 = pd.get_dummies(df.pop(val))
df = pd.concat([df.iloc[:, :p], df2, df.iloc[:, p:]], axis=1)
print (df)

   col1  A  B  C  col3
0     4  1  0  0  0.50
1     5  0  1  0  0.78
2     6  0  0  1  0.55
3     7  1  0  0  0.78
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

if the indexes are aligned correctly use DataFrame.join. DataFrame.drop to remove col2 and DataFrame.sort_index to order columns

df1.join(df2).drop(columns = 'col2').sort_index(axis = 1)

   col1  col2_A  col2_B  col2_C  col3
0     4       1       0       0  0.50
1     5       0       1       0  0.78
2     6       0       0       1  0.55
3     7       1       0       0  0.78

We can also use DataFrame.pivot_table instead pd.get_dummies

new_df = (df1.join(df1.pivot_table(columns = 'col2',
                                   index = df1.index,
                                   aggfunc = 'size',
                                   fill_value = 0)
                      .add_prefix('col2_'))
              .drop(columns = 'col2')
              .sort_index(axis = 1))
print(new_df)
   col1  col2_A  col2_B  col2_C  col3
0     4       1       0       0  0.50
1     5       0       1       0  0.78
2     6       0       0       1  0.55
3     7       1       0       0  0.78
ansev
  • 30,322
  • 5
  • 17
  • 31