6

I have two dataframes as follows:

df2 = pd.DataFrame(np.random.randn(5,2),columns=['A','C'])
df3 = pd.DataFrame(np.random.randn(5,2),columns=['B','D'])

I wish to get the columns in an alternating fashion such that I get the result below:

df4 = pd.DataFrame()
for i in range(len(df2.columns)):
    df4[df2.columns[i]]=df2[df2.columns[i]]
    df4[df3.columns[i]]=df3[df3.columns[i]]

df4 

    A   B   C   D
0   1.056889    0.494769    0.588765    0.846133
1   1.536102    2.015574    -1.279769   -0.378024
2   -0.097357   -0.886320   0.713624    -1.055808
3   -0.269585   -0.512070   0.755534    0.855884
4   -2.691672   -0.597245   1.023647    0.278428

I think I'm being really inefficient with this solution. What is the more pythonic/ pandic way of doing this?

p.s. In my specific case the column names are not A,B,C,D and aren't alphabetically arranged. Just so know which two dataframes I want to combine.

sachinruk
  • 9,571
  • 12
  • 55
  • 86

4 Answers4

9

If you need something more dynamic, first zip both columns names of both DataFrames and then flat it:

df5 = pd.concat([df2, df3], axis=1)
print (df5)
          A         C         B         D
0  0.874226 -0.764478  1.022128 -1.209092
1  1.411708 -0.395135 -0.223004  0.124689
2  1.515223 -2.184020  0.316079 -0.137779
3 -0.554961 -0.149091  0.179390 -1.109159
4  0.666985  1.879810  0.406585  0.208084

#http://stackoverflow.com/a/10636583/2901002
print (list(sum(zip(df2.columns, df3.columns), ())))
['A', 'B', 'C', 'D']
print (df5[list(sum(zip(df2.columns, df3.columns), ()))])
          A         B         C         D
0  0.874226  1.022128 -0.764478 -1.209092
1  1.411708 -0.223004 -0.395135  0.124689
2  1.515223  0.316079 -2.184020 -0.137779
3 -0.554961  0.179390 -0.149091 -1.109159
4  0.666985  0.406585  1.879810  0.208084
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I understand that the zip puts the columns names such that you get `[('A', 'B'), ('C', 'D')]`, and I can see what `sum` has done, but can you explain what `sum` has done? It's especially hard to understand why that empty tuple `()` helps with expanding that zipped array – sachinruk Jul 27 '16 at 00:52
  • This just sums the elements of iterable passed in the first argument, treating second argument as the initial value of the sum (if not given, 0 is used instead and this case will give you an error). So output is `tuple` - `('A', 'B', 'C', 'D')` which is converted to `list`. – jezrael Jul 27 '16 at 05:19
5

How about this?

df4 = pd.concat([df2, df3], axis=1)

Or do they have to be in a specific order? Anyway, you can always reorder them:

df4 = df4[['A','B','C','D']]

And without writing out the columns:

df4 = df4[[item for items in zip(df2.columns, df3.columns) for item in items]]
kloffy
  • 2,928
  • 2
  • 25
  • 34
  • 1
    _"I want to get the columns in an alternating fashion"_ So one from the first column followed by one from the second column then the second from the first column, etc.. – miradulo Jul 26 '16 at 06:26
  • @Mitch Gotcha, added a bit on reordering the columns. – kloffy Jul 26 '16 at 06:27
  • This is A solution, but I guess the point is rather than me writing out the column names to be able to get columns in a alternating fashion. – sachinruk Jul 26 '16 at 06:35
  • You were few seconds late for getting the answer accepted, @jezrael beat you to it. Sorry – sachinruk Jul 26 '16 at 06:37
4

You could concat and then reindex_axis.

df = pd.concat([df2, df3], axis=1)
df.reindex_axis(df.columns[::2].tolist() + df.columns[1::2].tolist(), axis=1)
Alex
  • 18,484
  • 8
  • 60
  • 80
2

Append even indices to df2 columns and odd indices to df3 columns. Use these new levels to sort.

df2_ = df2.T.set_index(np.arange(len(df2.columns)) * 2, append=True).T
df3_ = df3.T.set_index(np.arange(len(df3.columns)) * 2 + 1, append=True).T

df = pd.concat([df2_, df3_], axis=1).sort_index(1, 1)
df.columns = df.columns.droplevel(1)

df

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624