Python Pandas merge only certain columns

Question

Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc.

I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b - not the entire DataFrame.

The result would be a DataFrame with x, y, z, a, b.

I could merge then delete the unwanted columns, but it seems like there is a better method.

Andy: Holy cow that was easy...I need a break, I'm obviously making this too complicated. Thanks for the clarity! — BubbleGuppies, Jul 31 '13 at 19:07

score 277 · Answer 1 · answered Mar 13 '17 at 14:18

277

You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

This will give you everything in the original df + add that one corresponding column in df2 that you want to join.

answered Mar 13 '17 at 14:18

Arthur D. Howland

4,363
3
21
31

5

Can `Target_Column` be a list of columns? – Gathide Oct 12 '20 at 10:20
4

I believe this should be the accepted answer. @BubbleGuppies – rmmariano Apr 23 '21 at 19:25
4

@Gathide Yes, there can be multiple target columns like `df2[['key','target1','target2']]` – Cornelius Roemer Jul 07 '21 at 15:58

score 108 · Accepted Answer · edited Oct 27 '15 at 07:05

108

You could merge the sub-DataFrame (with just those columns):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

edited Oct 27 '15 at 07:05

beroe

11,784
5
34
79

answered Jul 31 '13 at 18:46

Andy Hayden

359,921
101
625
535

7

Hmmm, I wonder if there should be a native way to do this, like subset in dropna... will put together github issue – Andy Hayden Jul 31 '13 at 19:12
Hmmm ... I tried using this to merge column 'Unique_External_Users' from df2 to df1 but got an error ... "None of [Index(['U', 'n', 'i', 'q', 'u', 'e', '_', 'E', 'x', 't', 'e', 'r', 'n', 'a',\n 'l', '_', 'U', 's', 'e', 'r', 's'],\n dtype='object')] are in the [columns]" . – CoolDocMan Feb 28 '20 at 20:48
Here's the code . ... df1.merge(df2('Unique_External_Users')]) – CoolDocMan Feb 28 '20 at 20:53
7

@CoolDocMan I think you missed something from the proposed answer: `list('xab')` takes each element (letter) of the string 'xab' and converts it to a list element so `list('xab')` returns `['x', 'a', 'b']`. That works if each column has a single letter as a name. In your case I think you need to do df1.merge(df2['Unique_External_Users'], *other_arguments). ...Most probably you already solved it by now, just leaving this for newbies around, like me – SOf_PUAR Jul 03 '20 at 07:11

tonneofash · Answer 3 · 2022-02-08T14:41:18.277

46

If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:

df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
                left_on = 'key2', right_on = 'key1').drop(columns = ['key1'])

The .drop(columns = 'key1') part will prevent 'key1' from being kept in the resulting data frame, despite it being required to join in the first place.

edited Feb 08 '22 at 14:41

answered Oct 14 '19 at 10:14

tonneofash

649
6
13

6

I get the following error if I try this: `KeyError: "['key1'] not found in axis"` – Tanya Branagan Nov 10 '19 at 19:25
4

try .drop(columns= ['key1']) – psangam Dec 03 '19 at 06:47
Or .drop('key1', axis = 1) – tonneofash Dec 03 '19 at 20:51
2

or shorter: `.drop('key1', 1)` – maciejwww Oct 15 '20 at 14:21
1

Very good point. This is different than in SQL where you can `SELECT df1..., df1.,, df2.a, df2.b FROM df1 LEFT JOIN df2 ON df1.key2=df2.key1` (whereby not needed to selecting either `df1.key2` or `df2.key1`) Also good idea to replace the original df1 with the new df1 so that this will simply add new columns based on merge. – pas-calc Oct 26 '22 at 09:42

score 12 · Answer 4 · edited Dec 22 '16 at 01:09

12

You can use .loc to select the specific columns with all rows and then pull that. An example is below:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on 'key'. However, for dataframe2 you have specified .iloc which allows you to specific the rows and columns you want in a numerical format. Using :, your selecting all rows, but [0:5] selects the first 5 columns. You could use .loc to specify by name, but if your dealing with long column names, then .iloc may be better.

edited Dec 22 '16 at 01:09

Ajean

5,528
14
46
69

answered Dec 14 '16 at 20:33

Terrance DeJesus

221
3
6

3

Beware that [`.loc` will make a copy](https://stackoverflow.com/questions/23296282/what-rules-does-pandas-use-to-generate-a-view-vs-a-copy), and on a large df that can be painful. It might be better to merge then immediately take a column slice in the same expression. – smci Apr 19 '18 at 06:33

score 9 · Answer 5 · edited Jan 14 '19 at 23:19

This is to merge selected columns from two tables.

If table_1 contains t1_a,t1_b,t1_c..,id,..t1_z columns, and table_2 contains t2_a, t2_b, t2_c..., id,..t2_z columns, and only t1_a, id, t2_a are required in the final table, then

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)

score 3 · Answer 6 · answered Jul 07 '21 at 16:05

Slight extension of the accepted answer for multi-character column names, using inner join by default:

df1 = df1.merge(df2[["Key_Column", "Target_Column1", "Target_Column2"]])

This assumes that Key_Column is the only column both dataframes have in common.

Python Pandas merge only certain columns

6 Answers6

Linked

Related