How do I combine two dataframes?

Question

I have a initial dataframe D. I extract two data frames from it like this:

A = D[D.label == k]
B = D[D.label != k]

I want to combine A and B into one DataFrame. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.

Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — Gonçalo Peres, Nov 02 '20 at 14:41
From `pandas v1.4.1`: The `frame.append` method is deprecated and will be removed from pandas in a future version. Use `pandas.concat` instead. — Trenton McKinney, Apr 28 '22 at 17:06

score 253 · Accepted Answer · edited Jul 01 '22 at 23:34

253

DEPRECATED: DataFrame.append and Series.append were deprecated in v1.4.0.

Use append:

df_merged = df1.append(df2, ignore_index=True)

And to keep their indexes, set ignore_index=False.

edited Jul 01 '22 at 23:34

Mateen Ulhaq

24,552
19
101
135

answered Oct 12 '12 at 00:07

Joran Beasley

110,522
12
160
179

2

This works. It creates a new DataFrame though. Is there a way to do it inline? That would be nice for when I'm loading huge amounts of data from a database in batches so I could iteratively update the DataFrame without creating a copy each time. – Andrew Nov 05 '13 at 17:36
1

Yes, that's possible, see: https://stackoverflow.com/a/46661368/5717580 – martin-martin Oct 10 '17 at 07:55
5

From `pandas v1.4.1`: The `frame.append` method is deprecated and will be removed from pandas in a future version. Use `pandas.concat` instead. – Trenton McKinney Apr 28 '22 at 17:06

score 201 · Answer 2 · edited Jul 01 '22 at 23:36

201

Use pd.concat to join multiple dataframes:

df_merged = pd.concat([df1, df2], ignore_index=True, sort=False)

edited Jul 01 '22 at 23:36

Mateen Ulhaq

24,552
19
101
135

answered May 31 '15 at 11:47

ostrokach

17,993
11
78
90

2

I want to use this, but I'm trying to concatenate two columns of the same name o_O – lifelonglearner Apr 01 '20 at 02:13

score 93 · Answer 3 · edited Jul 01 '22 at 23:37

93

Merge across rows:

df_row_merged = pd.concat([df_a, df_b], ignore_index=True)

Merge across columns:

df_col_merged = pd.concat([df_a, df_b], axis=1)

edited Jul 01 '22 at 23:37

Mateen Ulhaq

24,552
19
101
135

answered Sep 22 '16 at 08:38

pelumi

1,530
12
21

martin-martin · Answer 4 · 2021-09-15T09:08:57.437

33

If you're working with big data and need to concatenate multiple datasets calling concat many times can get performance-intensive.

If you don't want to create a new df each time, you can instead aggregate the changes and call concat only once:

frames = [df_A, df_B]  # Or perform operations on the DFs
result = pd.concat(frames)

This is pointed out in the pandas docs under concatenating objects at the bottom of the section):

Note: It is worth noting however, that concat (and therefore append) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

edited Sep 15 '21 at 09:08

answered Oct 10 '17 at 07:53

martin-martin

3,274
1
33
60

2

I think there should be `pd.concat(frames)` since pandas doesn't have `append` method. – My Work Jan 04 '21 at 09:37
2

I don't fully undestand the list "comprehension" focus. What's important here is not calling append every time and hence gathering all the dataframes into a list first. Whether that list is established through a list comprehension or not is completely irrelevant. – MrR Apr 27 '21 at 19:06
Thanks for the very relevant comments, I updated the answer to address them. – martin-martin May 14 '21 at 07:55
what is the intended definition of the process_file(f) function? – lrthistlethwaite Sep 14 '21 at 17:57
That was meant as an example for performing operations on the individual DFs before concatenating them, but I see it's less helpful than I initially thought. Updated the answer, thanks. – martin-martin Sep 15 '21 at 09:10

score 7 · Answer 5 · answered Jan 09 '20 at 22:45

If you want to update/replace the values of first dataframe df1 with the values of second dataframe df2. you can do it by following steps —

Step 1: Set index of the first dataframe (df1)

df1.set_index('id')

Step 2: Set index of the second dataframe (df2)

df2.set_index('id')

and finally update the dataframe using the following snippet —

df1.update(df2)

score 2 · Answer 6 · answered Oct 06 '22 at 14:13

To join 2 pandas dataframes by column, using their indices as the join key, you can do this:

both = a.join(b)

And if you want to join multiple DataFrames, Series, or a mixture of them, by their index, just put them in a list, e.g.,:

everything = a.join([b, c, d])

See the pandas docs for DataFrame.join().

score 0 · Answer 7 · answered Jun 08 '22 at 18:57

# collect excel content into list of dataframes
data = []
for excel_file in excel_files:
    data.append(pd.read_excel(excel_file, engine="openpyxl"))

# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)

You can try the above when you are appending horizontally! Hope this helps sum1

score 0 · Answer 8 · answered Aug 23 '22 at 09:41

0

Use this code to attach two Pandas Data Frames horizontally:

df3 = pd.concat([df1, df2],axis=1, ignore_index=True, sort=False)

You must specify around what axis you intend to merge two frames.

answered Aug 23 '22 at 09:41

Farzad Amirjavid

649
5
13

score 0 · Answer 9 · answered Mar 19 '23 at 08:54

0

Both the dataframe should have same column name else instead of appending records by row wise, it will append as separate columns.

df = df.append(df1,ignore_index=True)
df = pd.concat([df1,df2], ignore_index=True)

answered Mar 19 '23 at 08:54

Senthil Kumar Vaithiyanathan

786
1
6
8

How do I combine two dataframes?

9 Answers9

Linked

Related