0

I'm trying to merge between two dataframes based on three columns that have the same names: "Hour" ,"date" and "name".

dataframe1:

>>>
    Timestamp             date      Hour    name    value
0   2019-06-23 07:00:00  2019-06-23   7      01A    0.033785
1   2019-06-23 08:00:00  2019-06-23   8      01A    0.049993
2   2019-06-23 09:32:00  2019-06-23   9      01A    0.132230
3   2019-06-23 07:00:00  2019-06-23   7      01B    0.141904
4   2019-06-23 08:00:00  2019-06-23   8      01B    0.163292
...

and dataframe2:

>>>
    date        Hour    name    measure1      measure2  measure3 ...
0 2019-06-23      7      01A    0.212         0.341      0.122
1 2019-06-23      8      01A    0.467         0.75       0.89
2 2019-06-23      9      01A    0.142         0.142      0.784
3 2019-06-23      7      01B    0.776         0.452      0.334
4 2019-06-23      8      01B    0.672         0.895      1.324
....

I'm trying to merge between df1 to df2 based on three columns: date, hour and time. it's important to say that there are more information in df2 than df1.This is how I'm trying to do the merge:

mrg = pd.merge(df2, df1,  how='left', left_on=['Hour','date','name'], right_on = ['Hour','date','name'])

The merge works but for some reason the column "values" gets only null values, like this:

>>>
    date        Hour    name    measure1      measure2  measure3 ...  values
0 2019-06-23      7      01A    0.212         0.341      0.122        Nan
1 2019-06-23      8      01A    0.467         0.75       0.89         Nan
2 2019-06-23      9      01A    0.142         0.142      0.784        Nan
3 2019-06-23      7      01B    0.776         0.452      0.334        Nan
4 2019-06-23      8      01B    0.672         0.895      1.324        Nan
....

instead of:

>>>
    date        Hour    name    measure1      measure2  measure3 ...  values
0 2019-06-23      7      01A    0.212         0.341      0.122        0.033785
1 2019-06-23      8      01A    0.467         0.75       0.89         0.049993
2 2019-06-23      9      01A    0.142         0.142      0.784        0.132230
3 2019-06-23      7      01B    0.776         0.452      0.334        0.141904
4 2019-06-23      8      01B    0.672         0.895      1.324        0.163292
....

I thought maybe it failes due to different types but I have changed the columns "name", "date" and "hour" into string type.

My end goal: to join the values column from df1 to df2, based on name,hour and date columns

Edit: when I changed the method to outer it added the values but seems like for some reason it doesn'r preform the merge well. like it doesn't recognize the similar values

Reut
  • 1,555
  • 4
  • 23
  • 55

1 Answers1

0

I got the same issue when I was trying to merge two tables. You can check the number of data points (rows) of these two datasets by df.info(). You are likely to have different number of rows tied to name, hour and date columns, and that's why Python has no idea which exact row you're gonna merge. It could be fixed when your datasets have the same size of rows.

Tzane
  • 2,752
  • 1
  • 10
  • 21