51

suppose I have two dataframes:

import pandas
....
....
test1 = pandas.DataFrame([1,2,3,4,5])
....
....
test2 = pandas.DataFrame([4,2,1,3,7])
....

I tried test1.append(test2) but it is the equivalent of R's rbind.

How can I combine the two as two columns of a dataframe similar to the cbind function in R?

smci
  • 32,567
  • 20
  • 113
  • 146
uday
  • 6,453
  • 13
  • 56
  • 94
  • have you considered changing which answer is accepted? I think Feng Mai's answer is far more complete. – cphlewis Aug 23 '22 at 16:51
  • Sorry, I needed the answer in 2015, not in 2021 !!! Not fair to change the answer - particularly not fair to the person who responded to me 7 years ago when I needed the answer – uday Sep 07 '22 at 02:22
  • I don’t like to go back 7 years ago. I appreciate you answering the question back then, but it’s no point in getting an answer 7 years later when I have long left Python for C# / Java and not interested in the answer anymore – uday Sep 07 '22 at 02:24

2 Answers2

80
test3 = pd.concat([test1, test2], axis=1)
test3.columns = ['a','b']

(But see the detailed answer by @feng-mai, below)

cphlewis
  • 15,759
  • 4
  • 46
  • 55
16

There is a key difference between concat(axis = 1) in pandas and cbind() in R:

concat attempts to merge/align by index. There is no concept of index in a R dataframe. If the two pandas dataframes' indexes are misaligned, the results are different from cbind (even if they have the same number of rows). You need to either make sure the indexes align or drop/reset the indexes.

Example:

import pandas as pd

test1 = pd.DataFrame([1,2,3,4,5])
test1.index = ['a','b','c','d','e']
test2 = pd.DataFrame([4,2,1,3,7])
test2.index = ['d','e','f','g','h']

pd.concat([test1, test2], axis=1)

     0    0
a  1.0  NaN
b  2.0  NaN
c  3.0  NaN
d  4.0  4.0
e  5.0  2.0
f  NaN  1.0
g  NaN  3.0
h  NaN  7.0

pd.concat([test1.reset_index(drop=True), test2.reset_index(drop=True)], axis=1)

   0  1
0  1  4
1  2  2
2  3  1
3  4  3
4  5  7

pd.concat([test1.reset_index(), test2.reset_index(drop=True)], axis=1)      

  index  0  0
0     a  1  4
1     b  2  2
2     c  3  1
3     d  4  3
4     e  5  7
Feng Mai
  • 2,749
  • 1
  • 28
  • 33
  • This index issue is absolutely central and yours should for that reason be the accepted answer. Anybody coming from R, looking for cbind, will need to know this. – Thomas Browne Aug 21 '22 at 21:24