Inserting multiple columns in dataframe at specific location based on another dataframe

Question

I am facing this issue, where I have one dataframe let's say df1:

>>> df1 = pd.DataFrame(data = np.arange(45).reshape(-1,9), columns = ['A1','B1', 'C1', 
                                                       'A2', 'B2', 'C2','A3','B3','C3'])
>>> df1

   A1  B1  C1  A2  B2  C2  A3  B3  C3
0   0   1   2   3   4   5   6   7   8
1   9  10  11  12  13  14  15  16  17
2  18  19  20  21  22  23  24  25  26
3  27  28  29  30  31  32  33  34  35
4  36  37  38  39  40  41  42  43  44

And another dataframe df2:

>>> df2 = pd.DataFrame(data = np.arange(15).reshape(-1,3), columns = ['AB1','AB2','AB3'])
>>> df2

   AB1  AB2  AB3
0    0    1    2
1    3    4    5
2    6    7    8
3    9   10   11
4   12   13   14

Now what I want is, to insert columns of df2 into df1 in specific positions, so that df1 becomes (actually a new df would work too):

>>> df1

   A1  AB1  B1  C1  A2  AB2  B2  C2  A3  AB3  B3  C3
0   0    0   1   2   3    1   4   5   6    2   7   8
1   9    3  10  11  12    4  13  14  15    5  16  17
2  18    6  19  20  21    7  22  23  24    8  25  26
3  27    9  28  29  30   10  31  32  33   11  34  35
4  36   12  37  38  39   13  40  41  42   14  43  44

I am now achieving this by creating a new empty df, then iterating over columns of both the df's, and then adding each column sequentially. Which is inefficient, ugly and defeats the whole purpose of DataFrames. So I would like to know is there already a method for this? I am not sure if such question has already been answered here, but I am sure I didn't find any results. If this has been discussed before, I will be glad if someone points me to it.

jezrael · Accepted Answer · 2019-09-27T07:24:46.893

You can use concat and then sorting with sorted by multiple attributes and change order by new values with subset - []:

df = pd.concat([df1, df2], axis=1)
cols = sorted(df.columns, key = lambda x: (x[-1], x[:-1]))   
print (cols)
['A1', 'AB1', 'B1', 'C1', 'A2', 'AB2', 'B2', 'C2', 'A3', 'AB3', 'B3', 'C3']

df = df[cols]
print (df)
   A1  AB1  B1  C1  A2  AB2  B2  C2  A3  AB3  B3  C3
0   0    0   1   2   3    1   4   5   6    2   7   8
1   9    3  10  11  12    4  13  14  15    5  16  17
2  18    6  19  20  21    7  22  23  24    8  25  26
3  27    9  28  29  30   10  31  32  33   11  34  35
4  36   12  37  38  39   13  40  41  42   14  43  44

Another idea is create MultiIndex in columns and sorting by second level, last flatten MultiIndex back by map:

df1.columns = [df1.columns.str[:1], df1.columns.str[1]]
df2.columns = [df2.columns.str[:2], df2.columns.str[2]]

df = pd.concat([df1, df2],axis=1).sort_index(axis=1, level=1)
df.columns = df.columns.map(''.join)
print (df)
   A1  AB1  B1  C1  A2  AB2  B2  C2  A3  AB3  B3  C3
0   0    0   1   2   3    1   4   5   6    2   7   8
1   9    3  10  11  12    4  13  14  15    5  16  17
2  18    6  19  20  21    7  22  23  24    8  25  26
3  27    9  28  29  30   10  31  32  33   11  34  35
4  36   12  37  38  39   13  40  41  42   14  43  44

EDIT:

For more general solution is possible use DataFrame.reindex with MultiIndex.from_product:

df1 = pd.DataFrame(data = np.arange(45).reshape(-1,9), 
                   columns = ['A1','B1', 'C1', 'A2', 'B2', 'C2','A3','B3','C3'])
df2 = pd.DataFrame(data = np.arange(15).reshape(-1,3), columns = ['Sum1','Sum2','Sum3'])


#change order of values in MultiIndex
df1.columns = [df1.columns.str[1], df1.columns.str[:1]]
df2.columns = [df2.columns.str[3], df2.columns.str[:3]]

mux = pd.MultiIndex.from_product([['1','2','3'], ['A','Sum','B','C']])
print (mux)
MultiIndex([('1',   'A'),
            ('1', 'Sum'),
            ('1',   'B'),
            ('1',   'C'),
            ('2',   'A'),
            ('2', 'Sum'),
            ('2',   'B'),
            ('2',   'C'),
            ('3',   'A'),
            ('3', 'Sum'),
            ('3',   'B'),
            ('3',   'C')],
           )

df = pd.concat([df1, df2], axis=1).reindex(columns=mux)
df.columns = [f'{b}{a}' for a, b in df.columns]
print (df)
   A1  Sum1  B1  C1  A2  Sum2  B2  C2  A3  Sum3  B3  C3
0   0     0   1   2   3     1   4   5   6     2   7   8
1   9     3  10  11  12     4  13  14  15     5  16  17
2  18     6  19  20  21     7  22  23  24     8  25  26
3  27     9  28  29  30    10  31  32  33    11  34  35
4  36    12  37  38  39    13  40  41  42    14  43  44

Thanks, this is a really smart solution. Could you keep your previous version as well? — Sayandip Dutta, Sep 27 '19 at 07:08
Would this method work if the column names in the second df were not so alphabetically compatible? For example, would this method work if the df2 had column names: `['Sum1','Sum2','Sum3']`? — Sayandip Dutta, Sep 27 '19 at 07:16
Actually your answer solves my requirement, I am just making sure if I face something like this in future. — Sayandip Dutta, Sep 27 '19 at 07:16
Yes, but in the case I mentioned, wouldn't the sorted columns be in this order: `['A','B','C','Sum']` — Sayandip Dutta, Sep 27 '19 at 07:18
Exactly, what if I want this to be in specifically `['A','Sum','B','C']` — Sayandip Dutta, Sep 27 '19 at 07:19
No problem, you have provided me a good platform, I think I can figure it out from here. — Sayandip Dutta, Sep 27 '19 at 07:19
I found a solution too: `cols = np.insert(df1.columns.values,range(1,9,3),df2.columns.values)`. Here `9` and `3` can be calculated based on the usecase. Then `df = pd.concat([df1, df2], axis=1)` and `df[cols]` — Sayandip Dutta, Sep 27 '19 at 07:36

Sayandip Dutta · Answer 2 · 2019-09-27T07:47:02.703

Based on @jezrael's answer, I found another solution:

cols = np.insert(df1.columns,range(1,len(df1.columns.values),3),df2.columns)
df = pd.concat([df1, df2], axis=1)
df[cols]

   A1  Sum1  B1  C1  A2  Sum2  B2  C2  A3  Sum3  B3  C3
0   0     0   1   2   3     1   4   5   6     2   7   8
1   9     3  10  11  12     4  13  14  15     5  16  17
2  18     6  19  20  21     7  22  23  24     8  25  26
3  27     9  28  29  30    10  31  32  33    11  34  35
4  36    12  37  38  39    13  40  41  42    14  43  44

Inserting multiple columns in dataframe at specific location based on another dataframe

2 Answers2

Linked