This is the code I use to create my df.
df = pd.read_csv('xxxx.csv')
df['date'] = pd.to_datetime(df.DeliveryDateID, format='%d/%m/%Y')
df.set_index(df.date, inplace=True)
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
df['Day'] = df.date.dt.day
df.drop(columns=['DeliveryDateID', 'AcCu', 'date'], inplace=True)
df
'''
df2 = pd.read_csv('xxxx.csv')
df2['date'] = pd.to_datetime(df2.DeliveryDateID, format='%d/%m/%Y')
df2.set_index(df2.date, inplace=True)
df2['Year'] = df2.date.dt.year
df2['Month'] = df2.date.dt.month
df2['Day'] = df2.date.dt.day
df2.drop(columns=['DeliveryDateID', 'AcCu', 'date'], inplace=True)
df2
Then when merging, I get less data for some reason.
df.merge(df2)
datasets:
Can someone please tell me what I'm doing wrong? Added my dataset as asked, please let me know if thats enough info.
df
Id Sd Imp
date
2021-09-22 000000000000000 16 1490
2021-09-22 000000000000000 1 204
2021-09-22 000000000000000 1 169
2021-09-22 222222222222222 33 520
2021-09-22 333333333333333 4 112
... ... ... ...
2022-09-04 1111111111111111 15 8984
2022-09-04 1111111111111111 11 3223
2022-09-04 1111111111111111 50 2723
2022-09-04 1111111111111111 44 2378
2022-09-04 1111111111111111 284 15806
35344 rows × 3 columns
df2
Id Sd Roc
date
2021-09-22 000000000000000 16 1372
2021-09-22 000000000000000 1 200
2021-09-22 000000000000000 1 167
2021-09-22 222222222222222 33 516
2021-09-22 333333333333333 4 84
... ... ... ...
2022-09-04 1111111111111111 15 8984
2022-09-04 1111111111111111 11 3223
2022-09-04 1111111111111111 50 2564
2022-09-04 1111111111111111 44 2378
2022-09-04 1111111111111111 284 14560
35344 rows × 3 columns
Current outcome(When executing a merge):
AccountId Spd Imp Roc
0 333333333333333 16 3456 1372
1 333333333333333 16 3456 6210
2 333333333333333 16 3456 7671
3 333333333333333 16 3456 620
4 333333333333333 16 3456 2077
... ... ... ... ...
5235083 1111111111111111 223 65478 68383
5235084 1111111111111111 99 23456 32097
5235085 0000000000000000 2333 131137 94210
5235086 0000000000000000 1576 54723 31041
5235087 0000000000000000 3456 78201 75491
5235088 rows × 4 columns
Expected outcome (What I'm trying to do. I know my index isn't in here at the moment, but I'm not sure on how to do that either.):
df2
Id Sd Roc Imp
date
2021-09-22 000000000000000 16 1372 1490
2021-09-22 000000000000000 1 200 204
2021-09-22 000000000000000 1 167 169
2021-09-22 222222222222222 33 516 520
2021-09-22 333333333333333 4 84
... ... ... ...
2022-09-04 1111111111111111 15 8984 8984
2022-09-04 1111111111111111 11 3223 3224
2022-09-04 1111111111111111 50 2564 2723
2022-09-04 1111111111111111 44 2378 2378
2022-09-04 1111111111111111 284 14560 15806
35344 rows × 4 columns