2

I am facing this issue, where I have one dataframe let's say df1:

>>> df1 = pd.DataFrame(data = np.arange(45).reshape(-1,9), columns = ['A1','B1', 'C1', 
                                                       'A2', 'B2', 'C2','A3','B3','C3'])
>>> df1

   A1  B1  C1  A2  B2  C2  A3  B3  C3
0   0   1   2   3   4   5   6   7   8
1   9  10  11  12  13  14  15  16  17
2  18  19  20  21  22  23  24  25  26
3  27  28  29  30  31  32  33  34  35
4  36  37  38  39  40  41  42  43  44

And another dataframe df2:

>>> df2 = pd.DataFrame(data = np.arange(15).reshape(-1,3), columns = ['AB1','AB2','AB3'])
>>> df2

   AB1  AB2  AB3
0    0    1    2
1    3    4    5
2    6    7    8
3    9   10   11
4   12   13   14

Now what I want is, to insert columns of df2 into df1 in specific positions, so that df1 becomes (actually a new df would work too):

>>> df1

   A1  AB1  B1  C1  A2  AB2  B2  C2  A3  AB3  B3  C3
0   0    0   1   2   3    1   4   5   6    2   7   8
1   9    3  10  11  12    4  13  14  15    5  16  17
2  18    6  19  20  21    7  22  23  24    8  25  26
3  27    9  28  29  30   10  31  32  33   11  34  35
4  36   12  37  38  39   13  40  41  42   14  43  44

I am now achieving this by creating a new empty df, then iterating over columns of both the df's, and then adding each column sequentially. Which is inefficient, ugly and defeats the whole purpose of DataFrames. So I would like to know is there already a method for this? I am not sure if such question has already been answered here, but I am sure I didn't find any results. If this has been discussed before, I will be glad if someone points me to it.

Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52

2 Answers2

2

You can use concat and then sorting with sorted by multiple attributes and change order by new values with subset - []:

df = pd.concat([df1, df2], axis=1)
cols = sorted(df.columns, key = lambda x: (x[-1], x[:-1]))   
print (cols)
['A1', 'AB1', 'B1', 'C1', 'A2', 'AB2', 'B2', 'C2', 'A3', 'AB3', 'B3', 'C3']

df = df[cols]
print (df)
   A1  AB1  B1  C1  A2  AB2  B2  C2  A3  AB3  B3  C3
0   0    0   1   2   3    1   4   5   6    2   7   8
1   9    3  10  11  12    4  13  14  15    5  16  17
2  18    6  19  20  21    7  22  23  24    8  25  26
3  27    9  28  29  30   10  31  32  33   11  34  35
4  36   12  37  38  39   13  40  41  42   14  43  44

Another idea is create MultiIndex in columns and sorting by second level, last flatten MultiIndex back by map:

df1.columns = [df1.columns.str[:1], df1.columns.str[1]]
df2.columns = [df2.columns.str[:2], df2.columns.str[2]]

df = pd.concat([df1, df2],axis=1).sort_index(axis=1, level=1)
df.columns = df.columns.map(''.join)
print (df)
   A1  AB1  B1  C1  A2  AB2  B2  C2  A3  AB3  B3  C3
0   0    0   1   2   3    1   4   5   6    2   7   8
1   9    3  10  11  12    4  13  14  15    5  16  17
2  18    6  19  20  21    7  22  23  24    8  25  26
3  27    9  28  29  30   10  31  32  33   11  34  35
4  36   12  37  38  39   13  40  41  42   14  43  44 

EDIT:

For more general solution is possible use DataFrame.reindex with MultiIndex.from_product:

df1 = pd.DataFrame(data = np.arange(45).reshape(-1,9), 
                   columns = ['A1','B1', 'C1', 'A2', 'B2', 'C2','A3','B3','C3'])
df2 = pd.DataFrame(data = np.arange(15).reshape(-1,3), columns = ['Sum1','Sum2','Sum3'])


#change order of values in MultiIndex
df1.columns = [df1.columns.str[1], df1.columns.str[:1]]
df2.columns = [df2.columns.str[3], df2.columns.str[:3]]

mux = pd.MultiIndex.from_product([['1','2','3'], ['A','Sum','B','C']])
print (mux)
MultiIndex([('1',   'A'),
            ('1', 'Sum'),
            ('1',   'B'),
            ('1',   'C'),
            ('2',   'A'),
            ('2', 'Sum'),
            ('2',   'B'),
            ('2',   'C'),
            ('3',   'A'),
            ('3', 'Sum'),
            ('3',   'B'),
            ('3',   'C')],
           )

df = pd.concat([df1, df2], axis=1).reindex(columns=mux)
df.columns = [f'{b}{a}' for a, b in df.columns]
print (df)
   A1  Sum1  B1  C1  A2  Sum2  B2  C2  A3  Sum3  B3  C3
0   0     0   1   2   3     1   4   5   6     2   7   8
1   9     3  10  11  12     4  13  14  15     5  16  17
2  18     6  19  20  21     7  22  23  24     8  25  26
3  27     9  28  29  30    10  31  32  33    11  34  35
4  36    12  37  38  39    13  40  41  42    14  43  44
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Based on @jezrael's answer, I found another solution:

cols = np.insert(df1.columns,range(1,len(df1.columns.values),3),df2.columns)
df = pd.concat([df1, df2], axis=1)
df[cols]

   A1  Sum1  B1  C1  A2  Sum2  B2  C2  A3  Sum3  B3  C3
0   0     0   1   2   3     1   4   5   6     2   7   8
1   9     3  10  11  12     4  13  14  15     5  16  17
2  18     6  19  20  21     7  22  23  24     8  25  26
3  27     9  28  29  30    10  31  32  33    11  34  35
4  36    12  37  38  39    13  40  41  42    14  43  44
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52