Insert several columns instead of one existing to pandas dataframe

Question

I have a question - how to insert multiple (for example 3) columns to the DataFrame on the location of an existing column? In other words, I had a column with some categorical values which I encode with one-hot encoding - as a result, I obtained 3 new columns. Now, I want to drop the original column and insert resulting columns on its location (and not to the end of data frame). Any ideas of how to do it efficiently? I'll appreciate any help.

**df1 - Original datafarme** :

   col1 col2  col3
0   4    A    0.5
1   5    B    0.78
2   6    C    0.55
3   7    A    0.78

**df2 - created one-hot encoding of categorical col2** :

   col2_A col2_B  col2_C
0   1       0       0
1   0       1       0
2   0       0       1
3   1       0       0

How to insert columns of df2 to df1, instead of col2 to obtain:

**Updated df1**

   col1 col2_A col2_b col2_C  col3
0   4    1      0        0    0.5
1   5    0      1        0    0.78
2   6    0      0        1    0.55
3   7    1      0        0    0.78

if possible put the to be added `df` separately and use `concat` or `join` or `merge` as per the requirement. — Abhishake Gupta, Feb 10 '20 at 12:56
`df1.join(df2).drop(columns = 'col2').sort_index(axis = 1) ` isn't it simpler? ? — ansev, Feb 10 '20 at 13:28

score 1 · Answer 1 · answered Feb 10 '20 at 12:51

1

use

df_concat = pd.concat([df1, df2], axis=1)

then to drop that col_2 use

df_concat.drop(['col_2'], axis = 1)

answered Feb 10 '20 at 12:51

Andy_101

1,246
10
20

score 0 · Accepted Answer · edited Feb 10 '20 at 21:46

Solution with any new columns in df2 (not necessary starts with col2)

Use Index.get_loc for positions, so possible filter by positions with DataFrame.iloc values before and after column and then join together in concat, last if necessary remove column:

val = 'col2'
p = df.columns.get_loc(val)

#possible solution for dummies, be free use your solution
#df2 = pd.get_dummies(df[val])
df = pd.concat([df.iloc[:, :p], df2, df.iloc[:, p:]], axis=1).drop(val, axis=1)
print (df)

   col1  A  B  C  col3
0     4  1  0  0  0.50
1     5  0  1  0  0.78
2     6  0  0  1  0.55
3     7  1  0  0  0.78

If prefixes are necessary:

val = 'col2'
p = df.columns.get_loc(val)
#possible solution for dummies, be free use your solution
#df2 = pd.get_dummies(df[[val]])
df = pd.concat([df.iloc[:, :p], df2, df.iloc[:, p:]], axis=1).drop(val, axis=1)
print (df)

   col1  col2_A  col2_B  col2_C  col3
0     4       1       0       0  0.50
1     5       0       1       0  0.78
2     6       0       0       1  0.55
3     7       1       0       0  0.78

Or use DataFrame.pop in get_dummies or another solution:

val = 'col2'
p = df.columns.get_loc(val)
#possible solution for dummies, be free use your solution
#df2 = pd.get_dummies(df.pop(val))
df = pd.concat([df.iloc[:, :p], df2, df.iloc[:, p:]], axis=1)
print (df)

   col1  A  B  C  col3
0     4  1  0  0  0.50
1     5  0  1  0  0.78
2     6  0  0  1  0.55
3     7  1  0  0  0.78

the question is `How to insert columns of df2 to df1, instead of col2 to obtain:` — ansev, Feb 10 '20 at 13:20

ansev · Answer 3 · 2020-02-10T13:16:47.287

0

if the indexes are aligned correctly use DataFrame.join. DataFrame.drop to remove col2 and DataFrame.sort_index to order columns

df1.join(df2).drop(columns = 'col2').sort_index(axis = 1)

   col1  col2_A  col2_B  col2_C  col3
0     4       1       0       0  0.50
1     5       0       1       0  0.78
2     6       0       0       1  0.55
3     7       1       0       0  0.78

We can also use DataFrame.pivot_table instead pd.get_dummies

new_df = (df1.join(df1.pivot_table(columns = 'col2',
                                   index = df1.index,
                                   aggfunc = 'size',
                                   fill_value = 0)
                      .add_prefix('col2_'))
              .drop(columns = 'col2')
              .sort_index(axis = 1))
print(new_df)
   col1  col2_A  col2_B  col2_C  col3
0     4       1       0       0  0.50
1     5       0       1       0  0.78
2     6       0       0       1  0.55
3     7       1       0       0  0.78

edited Feb 10 '20 at 13:16

answered Feb 10 '20 at 13:08

ansev

30,322
5
17
31

????????? check output – ansev Feb 10 '20 at 13:19
Do you know the OP problem and the OP DataFrame? I'm starting to think so – ansev Feb 10 '20 at 13:21
OP already has df2, I have simply made a note that pivot_table also works for this case, but the question is not to get the `dummies` – ansev Feb 10 '20 at 13:23
Also changed my solution, thanks for pointing it. – jezrael Feb 10 '20 at 13:26

Insert several columns instead of one existing to pandas dataframe

3 Answers3