Trying to join/merge two dataframes, but when merging I'm either missing data or I duplicate

Question

This is the code I use to create my df.

df = pd.read_csv('xxxx.csv')
df['date'] = pd.to_datetime(df.DeliveryDateID, format='%d/%m/%Y')
df.set_index(df.date, inplace=True)

df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
df['Day'] = df.date.dt.day
df.drop(columns=['DeliveryDateID', 'AcCu', 'date'], inplace=True)

df

'''

df2 = pd.read_csv('xxxx.csv')
df2['date'] = pd.to_datetime(df2.DeliveryDateID, format='%d/%m/%Y')
df2.set_index(df2.date, inplace=True)

df2['Year'] = df2.date.dt.year
df2['Month'] = df2.date.dt.month
df2['Day'] = df2.date.dt.day
df2.drop(columns=['DeliveryDateID', 'AcCu', 'date'], inplace=True)

df2

Then when merging, I get less data for some reason.

df.merge(df2)

datasets:

Can someone please tell me what I'm doing wrong? Added my dataset as asked, please let me know if thats enough info.

df
           Id               Sd  Imp
date            
2021-09-22  000000000000000 16  1490
2021-09-22  000000000000000 1   204
2021-09-22  000000000000000 1   169
2021-09-22  222222222222222 33  520
2021-09-22  333333333333333 4   112
... ... ... ...
2022-09-04  1111111111111111    15  8984
2022-09-04  1111111111111111    11  3223
2022-09-04  1111111111111111    50  2723
2022-09-04  1111111111111111    44  2378
2022-09-04  1111111111111111    284 15806
35344 rows × 3 columns


df2
            Id              Sd  Roc
date            
2021-09-22  000000000000000 16  1372
2021-09-22  000000000000000 1   200
2021-09-22  000000000000000 1   167
2021-09-22  222222222222222 33  516
2021-09-22  333333333333333 4   84
... ... ... ...
2022-09-04  1111111111111111    15  8984
2022-09-04  1111111111111111    11  3223
2022-09-04  1111111111111111    50  2564
2022-09-04  1111111111111111    44  2378
2022-09-04  1111111111111111    284 14560
35344 rows × 3 columns

Current outcome(When executing a merge):

    AccountId      Spd  Imp     Roc
0   333333333333333 16  3456    1372
1   333333333333333 16  3456    6210
2   333333333333333 16  3456    7671
3   333333333333333 16  3456    620
4   333333333333333 16  3456    2077
... ... ... ... ...
5235083 1111111111111111    223 65478   68383
5235084 1111111111111111    99  23456   32097
5235085 0000000000000000    2333    131137  94210
5235086 0000000000000000    1576    54723   31041
5235087 0000000000000000    3456    78201   75491
5235088 rows × 4 columns

Expected outcome (What I'm trying to do. I know my index isn't in here at the moment, but I'm not sure on how to do that either.):

df2
            Id              Sd  Roc  Imp
date            
2021-09-22  000000000000000 16  1372 1490
2021-09-22  000000000000000 1   200  204
2021-09-22  000000000000000 1   167  169
2021-09-22  222222222222222 33  516  520
2021-09-22  333333333333333 4   84
... ... ... ...
2022-09-04  1111111111111111    15  8984  8984
2022-09-04  1111111111111111    11  3223  3224
2022-09-04  1111111111111111    50  2564  2723
2022-09-04  1111111111111111    44  2378  2378
2022-09-04  1111111111111111    284 14560  15806
35344 rows × 4 columns

How does your two dataframes look like ? Can you share a sample of your dataset ? — Timeless, Sep 29 '22 at 00:50
I suppose that when you talk about loosing data, after the merge you mean losing certain columns ? What's your expected output exaclty ? — Timeless, Sep 29 '22 at 01:36
@abokey No, I don't lose any columns actually. But they do double in size! My expected outcome is combine all matching columns, with dataframe that include the comlumns Imp & Roc. Does that make sense? — Zzzzz99, Sep 29 '22 at 01:50

Caio Farias · Answer 1 · 2022-09-29T11:02:59.757

0

Try to look about the "how" parameter, by default this parameter is "inner", so the result will be only the data that are in both dataframes.

It's commmon to use the inner when you want this behavior. Or use the "left" join to return all data in df1.

So, try to use the parameter inside the merge function: how="left"

you can use like: df.merge(df2, how='left')

edited Sep 29 '22 at 11:02

answered Sep 29 '22 at 00:51

Caio Farias

1
2

Where would I put the 'how=' function? – Zzzzz99 Sep 29 '22 at 01:36
you can use like: df.merge(df2, how='left') – Caio Farias Sep 29 '22 at 11:02

Trying to join/merge two dataframes, but when merging I'm either missing data or I duplicate

1 Answers1