156

This is probably easy, but I have the following data:

In data frame 1:

index dat1
0     9
1     5

In data frame 2:

index dat2
0     7
1     6

I want a data frame with the following form:

index dat1  dat2
0     9     7
1     5     6

I've tried using the append method, but I get a cross join (i.e. cartesian product).

What's the right way to do this?

BenDundee
  • 4,389
  • 3
  • 28
  • 34

6 Answers6

186

It seems in general you're just looking for a join:

> dat1 = pd.DataFrame({'dat1': [9,5]})
> dat2 = pd.DataFrame({'dat2': [7,6]})
> dat1.join(dat2)
   dat1  dat2
0     9     7
1     5     6
U2EF1
  • 12,907
  • 3
  • 35
  • 37
  • 73
    Or `pd.concat([dat1, dat2], axis=1)` in this case. – DSM Dec 16 '13 at 03:35
  • 4
    @BenDundee Join and concat use a lot of the same code under the hood, so the "right" way probably only matters when you consider edge cases. For instance here if both DataFrames had a 'data' column the join would *fail*, whereas a concat would give you two columns named 'data'. – U2EF1 Dec 16 '13 at 20:37
  • @U2EF1: I was talking about your response vs. mine. There are always N ways to skin a cat :) – BenDundee Dec 16 '13 at 21:47
  • @BenDundee I see. That method discards the unique index and has even weirder side effects in more complicated cases, though. For instance if I had two columns named 'data', grouping/summing would start summing up the different data columns, which is almost certainly not what you want. String data would be concatenated. – U2EF1 Dec 16 '13 at 22:13
  • 3
    As pointed by @jeremy-z, it is very important to reset indexes in both dataset if they don't share same index. Otherwise you will get one dataset with lot of NaNs rows. – Israel Varea Jun 18 '19 at 10:21
  • @DSM concat does not work for me, when dat1 is a new empty dataframe – FabioSpaghetti Sep 18 '19 at 06:04
  • Dear @U2EF1 & @DSM , I pass a data_frame to a function and apply these two methods (`join and concat`), but after the execution of function the manipulation do not applied in the passed data_frame. – Hosein Aqajani Mar 27 '20 at 07:50
76

You can also use:

dat1 = pd.concat([dat1, dat2], axis=1)
Ella Cohen
  • 1,375
  • 1
  • 10
  • 14
  • 1
    In case you encounter `InvalidIndexError: Reindexing only valid with uniquely valued Index objects `, you can use: `pd.concat([dat1.reset_index(), dat2], axis=1)` – beyondfloatingpoint Aug 27 '19 at 09:21
64

Both join() and concat() way could solve the problem. However, there is one warning I have to mention: Reset the index before you join() or concat() if you trying to deal with some data frame by selecting some rows from another DataFrame.

One example below shows some interesting behavior of join and concat:

dat1 = pd.DataFrame({'dat1': range(4)})
dat2 = pd.DataFrame({'dat2': range(4,8)})
dat1.index = [1,3,5,7]
dat2.index = [2,4,6,8]

# way1 join 2 DataFrames
print(dat1.join(dat2))
# output
   dat1  dat2
1     0   NaN
3     1   NaN
5     2   NaN
7     3   NaN

# way2 concat 2 DataFrames
print(pd.concat([dat1,dat2],axis=1))
#output
   dat1  dat2
1   0.0   NaN
2   NaN   4.0
3   1.0   NaN
4   NaN   5.0
5   2.0   NaN
6   NaN   6.0
7   3.0   NaN
8   NaN   7.0

#reset index 
dat1 = dat1.reset_index(drop=True)
dat2 = dat2.reset_index(drop=True)
#both 2 ways to get the same result

print(dat1.join(dat2))
   dat1  dat2
0     0     4
1     1     5
2     2     6
3     3     7


print(pd.concat([dat1,dat2],axis=1))
   dat1  dat2
0     0     4
1     1     5
2     2     6
3     3     7
Chris Tang
  • 567
  • 7
  • 18
Jeremy Z
  • 2,050
  • 1
  • 13
  • 15
  • Well said and good point. I tried without resetting index and generated a whole lot NULLS – Anand Nov 14 '17 at 15:10
  • Without doing the reset step, my data looked fine and good, but obviously something didn't work well behind the scenes. Thanks for pointing it out! The reset got my model up and running! – Ionut Ciuta Mar 26 '18 at 21:38
  • This should be the accepted answer! It always generates NaN s if we do not reset index. – Srivatsan Feb 11 '20 at 01:12
  • This step saved me. I was trying to understand why either concat and join was throwing a lot of NaNs. Thanks for sharing this. – Gustavo Rottgering Jul 06 '20 at 18:08
  • Why do I have to reset the index? I tried it without reseting the index and it works fine – PeterBe Jun 08 '21 at 13:50
5

Perhaps too simple by anyways...

dat1 = pd.DataFrame({'dat1': [9,5]})
dat2 = pd.DataFrame({'dat2': [7,6]})
dat1['dat2'] = dat2  # Uses indices from dat1

Result:

    dat1  dat2
0     9     7
1     5     6
MarMat
  • 790
  • 8
  • 12
2

You can assign a new column. Use indices to align correspoding rows:

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]}, index=[0, 1, 2])
df2 = pd.DataFrame({'C': [100, 200, 300]}, index=[1, 2, 3])

df1['C'] = df2['C']

Result:

   A   B      C
0  1  10    NaN
1  2  20  100.0
2  3  30  200.0

Ignore indices:

df1['C'] = df2['C'].reset_index(drop=True)

Result:

   A   B    C
0  1  10  100
1  2  20  200
2  3  30  300
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
-8

Just a matter of the right google search:

data = dat_1.append(dat_2)
data = data.groupby(data.index).sum()
BenDundee
  • 4,389
  • 3
  • 28
  • 34