Append column to pandas dataframe

Question

This is probably easy, but I have the following data:

In data frame 1:

index dat1
0     9
1     5

In data frame 2:

index dat2
0     7
1     6

I want a data frame with the following form:

index dat1  dat2
0     9     7
1     5     6

I've tried using the append method, but I get a cross join (i.e. cartesian product).

What's the right way to do this?

@lowtech: does that ensure that the indices are paired up properly? — BenDundee, Dec 16 '13 at 21:48

score 186 · Accepted Answer · answered Dec 16 '13 at 03:33

186

It seems in general you're just looking for a join:

> dat1 = pd.DataFrame({'dat1': [9,5]})
> dat2 = pd.DataFrame({'dat2': [7,6]})
> dat1.join(dat2)
   dat1  dat2
0     9     7
1     5     6

answered Dec 16 '13 at 03:33

U2EF1

12,907
3
35
37

73

Or `pd.concat([dat1, dat2], axis=1)` in this case. – DSM Dec 16 '13 at 03:35
4

@BenDundee Join and concat use a lot of the same code under the hood, so the "right" way probably only matters when you consider edge cases. For instance here if both DataFrames had a 'data' column the join would *fail*, whereas a concat would give you two columns named 'data'. – U2EF1 Dec 16 '13 at 20:37
@U2EF1: I was talking about your response vs. mine. There are always N ways to skin a cat :) – BenDundee Dec 16 '13 at 21:47
@BenDundee I see. That method discards the unique index and has even weirder side effects in more complicated cases, though. For instance if I had two columns named 'data', grouping/summing would start summing up the different data columns, which is almost certainly not what you want. String data would be concatenated. – U2EF1 Dec 16 '13 at 22:13
3

As pointed by @jeremy-z, it is very important to reset indexes in both dataset if they don't share same index. Otherwise you will get one dataset with lot of NaNs rows. – Israel Varea Jun 18 '19 at 10:21
@DSM concat does not work for me, when dat1 is a new empty dataframe – FabioSpaghetti Sep 18 '19 at 06:04
Dear @U2EF1 & @DSM , I pass a data_frame to a function and apply these two methods (`join and concat`), but after the execution of function the manipulation do not applied in the passed data_frame. – Hosein Aqajani Mar 27 '20 at 07:50

score 76 · Answer 2 · answered Sep 07 '16 at 10:00

76

You can also use:

dat1 = pd.concat([dat1, dat2], axis=1)

answered Sep 07 '16 at 10:00

Ella Cohen

1,375
1
10
14

1

In case you encounter `InvalidIndexError: Reindexing only valid with uniquely valued Index objects `, you can use: `pd.concat([dat1.reset_index(), dat2], axis=1)` – beyondfloatingpoint Aug 27 '19 at 09:21

score 64 · Answer 3 · edited May 23 '20 at 09:17

64

Both join() and concat() way could solve the problem. However, there is one warning I have to mention: Reset the index before you join() or concat() if you trying to deal with some data frame by selecting some rows from another DataFrame.

One example below shows some interesting behavior of join and concat:

dat1 = pd.DataFrame({'dat1': range(4)})
dat2 = pd.DataFrame({'dat2': range(4,8)})
dat1.index = [1,3,5,7]
dat2.index = [2,4,6,8]

# way1 join 2 DataFrames
print(dat1.join(dat2))
# output
   dat1  dat2
1     0   NaN
3     1   NaN
5     2   NaN
7     3   NaN

# way2 concat 2 DataFrames
print(pd.concat([dat1,dat2],axis=1))
#output
   dat1  dat2
1   0.0   NaN
2   NaN   4.0
3   1.0   NaN
4   NaN   5.0
5   2.0   NaN
6   NaN   6.0
7   3.0   NaN
8   NaN   7.0

#reset index 
dat1 = dat1.reset_index(drop=True)
dat2 = dat2.reset_index(drop=True)
#both 2 ways to get the same result

print(dat1.join(dat2))
   dat1  dat2
0     0     4
1     1     5
2     2     6
3     3     7


print(pd.concat([dat1,dat2],axis=1))
   dat1  dat2
0     0     4
1     1     5
2     2     6
3     3     7

edited May 23 '20 at 09:17

Chris Tang

567
7
18

answered Sep 15 '17 at 07:57

Jeremy Z

2,050
1
13
15

Well said and good point. I tried without resetting index and generated a whole lot NULLS – Anand Nov 14 '17 at 15:10
Without doing the reset step, my data looked fine and good, but obviously something didn't work well behind the scenes. Thanks for pointing it out! The reset got my model up and running! – Ionut Ciuta Mar 26 '18 at 21:38
This should be the accepted answer! It always generates NaN s if we do not reset index. – Srivatsan Feb 11 '20 at 01:12
This step saved me. I was trying to understand why either concat and join was throwing a lot of NaNs. Thanks for sharing this. – Gustavo Rottgering Jul 06 '20 at 18:08
Why do I have to reset the index? I tried it without reseting the index and it works fine – PeterBe Jun 08 '21 at 13:50

score 5 · Answer 4 · answered Apr 26 '22 at 10:18

5

Perhaps too simple by anyways...

dat1 = pd.DataFrame({'dat1': [9,5]})
dat2 = pd.DataFrame({'dat2': [7,6]})
dat1['dat2'] = dat2  # Uses indices from dat1

Result:

    dat1  dat2
0     9     7
1     5     6

answered Apr 26 '22 at 10:18

MarMat

790
8
12

Mykola Zotko · Answer 5 · 2022-03-31T06:49:08.167

You can assign a new column. Use indices to align correspoding rows:

df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]}, index=[0, 1, 2])
df2 = pd.DataFrame({'C': [100, 200, 300]}, index=[1, 2, 3])

df1['C'] = df2['C']

Result:

   A   B      C
0  1  10    NaN
1  2  20  100.0
2  3  30  200.0

Ignore indices:

df1['C'] = df2['C'].reset_index(drop=True)

Result:

   A   B    C
0  1  10  100
1  2  20  200
2  3  30  300

BenDundee · Answer 6 · 2013-12-16T13:31:52.163

-8

Just a matter of the right google search:

data = dat_1.append(dat_2)
data = data.groupby(data.index).sum()

edited Dec 16 '13 at 13:31

answered Dec 16 '13 at 03:27

BenDundee

4,389
3
28
34

Append column to pandas dataframe

6 Answers6

Linked

Related