2

I am merging two dataframes with different lengths with the following code :

df1=pd.merge(df1, df2, on='OFFERING_ID',how='left')

The number of rows before the merge is 400 0000, after the merge the number of row is 600000.

How can you solve that please?

Thanks

Jesus Sono
  • 78
  • 11
nask
  • 35
  • 4
  • Doesn't there need to be an `indicator=True` at the end: `df1=pd.merge(df1, df2, on='OFFERING_ID',how='left', indicator=True)`? https://stackoverflow.com/a/32676153/13138080 – Ollie Feb 24 '21 at 20:25
  • Can you please post your desired output? – BiOS Feb 24 '21 at 20:27

2 Answers2

1

The problem isn't the lengths, it's the OFFERING_ID.

In short, OFFERING_ID isn't unique in the second dataframe. So you get more than one match per OFFERING_ID, and thus more lines than the original.

I made an example in repl.it, the code is also pasted below:

import pandas as pd

df1 = pd.DataFrame(
    [
        {"OFFERING_ID": 1, "another_field": "whatever"},
        {"OFFERING_ID": 2, "another_field": "whatever"},
        {"OFFERING_ID": 3, "another_field": "whatever"},
        {"OFFERING_ID": 4, "another_field": "whatever"},
    ]
)

df2 = pd.DataFrame(
    [
        {"OFFERING_ID": "1", "another_field": "whatever"},
        {"OFFERING_ID": 1, "another_field": "whatever"},
        {"OFFERING_ID": 1, "another_field": "whatever"},
    ]
)

print(df1.shape)
print(df2.shape)
print(pd.merge(df1, df2, on="OFFERING_ID", how="left").shape)
Tiago Fassoni
  • 138
  • 1
  • 7
1
offering_id_dfs = []
for id in df1.OFFERING_ID.unique():
    sub_df1 = df1.loc[df1.OFFERING_ID == id , :].reset_index(drop=True)
    sub_df2 = df2.loc[df2.OFFERING_ID == id , :].reset_index(drop=True)
    concat_df = pd.concat([sub_df1, sub_df2], axis=1)
    concat_df["OFFERING_ID"] = id
    offering_id_dfs.append(concat_df)
df3 = pd.concat(offering_id_dfs ).reset_index(drop=True)

That might work as long as each DataFrame contains only one column beside your Offering_ID and all df2.Offering_Id.unique() are in the Set of df1.Offering_Id.unique().

Lukas Kaspras
  • 409
  • 1
  • 5
  • 15