Merge dataframe with different lengths

Question

I am merging two dataframes with different lengths with the following code :

df1=pd.merge(df1, df2, on='OFFERING_ID',how='left')

The number of rows before the merge is 400 0000, after the merge the number of row is 600000.

How can you solve that please?

Thanks

Doesn't there need to be an `indicator=True` at the end: `df1=pd.merge(df1, df2, on='OFFERING_ID',how='left', indicator=True)`? https://stackoverflow.com/a/32676153/13138080 — Ollie, Feb 24 '21 at 20:25

score 1 · Answer 1 · answered Feb 24 '21 at 20:39

The problem isn't the lengths, it's the OFFERING_ID.

In short, OFFERING_ID isn't unique in the second dataframe. So you get more than one match per OFFERING_ID, and thus more lines than the original.

I made an example in repl.it, the code is also pasted below:

import pandas as pd

df1 = pd.DataFrame(
    [
        {"OFFERING_ID": 1, "another_field": "whatever"},
        {"OFFERING_ID": 2, "another_field": "whatever"},
        {"OFFERING_ID": 3, "another_field": "whatever"},
        {"OFFERING_ID": 4, "another_field": "whatever"},
    ]
)

df2 = pd.DataFrame(
    [
        {"OFFERING_ID": "1", "another_field": "whatever"},
        {"OFFERING_ID": 1, "another_field": "whatever"},
        {"OFFERING_ID": 1, "another_field": "whatever"},
    ]
)

print(df1.shape)
print(df2.shape)
print(pd.merge(df1, df2, on="OFFERING_ID", how="left").shape)

score 1 · Accepted Answer · answered Feb 24 '21 at 21:32

offering_id_dfs = []
for id in df1.OFFERING_ID.unique():
    sub_df1 = df1.loc[df1.OFFERING_ID == id , :].reset_index(drop=True)
    sub_df2 = df2.loc[df2.OFFERING_ID == id , :].reset_index(drop=True)
    concat_df = pd.concat([sub_df1, sub_df2], axis=1)
    concat_df["OFFERING_ID"] = id
    offering_id_dfs.append(concat_df)
df3 = pd.concat(offering_id_dfs ).reset_index(drop=True)

That might work as long as each DataFrame contains only one column beside your Offering_ID and all df2.Offering_Id.unique() are in the Set of df1.Offering_Id.unique().

Merge dataframe with different lengths

2 Answers2