0

I have merged two data frames from two large data tables (300k + rows) but when I right_only merged them on various columns, only six rows returned for one merged table, and sometimes the merge returned a table with zero rows. Can someone help me understand what is happening?

Here is an example of the command I used to merge:

right_retail = merged_retail[merged_retail['_merge']=='right_only']

Regards

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • where are you doing merging, in the above code it seems you are just filtering,Can you provide some sample data? – snehil Jun 01 '20 at 12:52
  • If you can load them into pandas dataframes you should be able to use something like https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html `df1.merge(df2, right_on=key)` – Alexander Jansing Jun 01 '20 at 12:59
  • The command you are showing is a filter on the merged data frame. Show us some sample data from your 2 dataframes that you have merged and also the syntax that you have used. – VTi Jun 01 '20 at 12:59
  • This might help you, https://stackoverflow.com/questions/53645882/pandas-merging-101 – sushanth Jun 01 '20 at 13:10
  • Unfortunately I cant share any data. I realize this is pretty unhelpful but what I can say is that I used two databases each with separate tables. Im checking two(located on one db) against another with the ultimate goal to see which rows are mismatched between specific columns of the tables. So I loaded the data in, sliced it by column, then merged on those columns and then filtered using the above command because I want to see which rows are mismatched. But I dont really understand what the filtering in the above command is doing. Im new to python; this is my first project. – KeithRoberts Jun 01 '20 at 13:55
  • in above command, you are selecting **rows** from dataframe **merge_retail** whose value in column named **_merge** is equal to string **'right_only'** , and after that you are giving all the selected rows to a new dataframe **right_retail**. – snehil Jun 01 '20 at 14:13
  • So I have used this code : `merged_retail = pd.merge(slice_df3_retail, slice_df2_retail, how="right", on = "RetailFirm", indicator = True)` Then I have done the following: `right_retail = merged_retail[merged_retail['_merge']=='right_only']` – KeithRoberts Jun 01 '20 at 14:43

1 Answers1

0

This is the syntax for right join and you can add more keys for join.

result = pd.merge(left, right, how='right', on=['key1', 'key2'])
VTi
  • 1,309
  • 6
  • 14