1

I have two dataframes. One called SERVICES and one called TIMES.

I am joining them together like so:

servicesMerged = pd.merge(services, times, left_on='Ref_Id', right_on='Ref_ID')

This is fine and works, except some of the TIMES data is missing a ref_id.

This is service data for a booking system, so for example we might have this:

**TIMES**

Ref_Id  |  TIMES
1  |  30
2  |  15
3  |  10


**SERVICES**

Ref_ID  |  Name
1  |  Mowing
2  |  Raking
3  |  Blowing
4  |  Trimming

What is happening, is we're getting a nice merge, but the service Trimming does not come into the new dataset, as it's missing the time in the times dataframe.

What we need it to do, is, if the time is missing (as per this example) that we add some data, so we'd add say 15 minutes.

Something you would traditionally do like so:

If not exists time:
    Create a time and make it 15

I've tried how = inner, outer, left, right, but nothing works.

How can I, if a row is missing like above, force the data to be added to the merged data?

Thank you.

robster
  • 626
  • 1
  • 7
  • 22

1 Answers1

1

Creating the dfs like this:

times = pd.DataFrame({'Ref_Id':[1,2,3],
                      'TIMES':[30, 15, 10]})

services = pd.DataFrame({'Ref_ID':[1,2,3,4],
                         'Name':['Mowing', 'Raking', 'Blowing', 'Trimming']})

Then you should just be able to add how='left' to your code (note I had to swap your left_on and right_on, as the capital D in Ref_ID is in the left service table):

servicesMerged = pd.merge(services, times, left_on='Ref_ID', right_on='Ref_Id', how='left')

   Ref_ID      Name  Ref_Id  TIMES
0       1    Mowing     1.0   30.0
1       2    Raking     2.0   15.0
2       3   Blowing     3.0   10.0
3       4  Trimming     NaN    NaN

alternatively, you can write it like this:

servicesMerged = services.merge(times, left_on='Ref_ID', right_on='Ref_Id', how='left')

To fill in the blank times, you can use .fillna():

servicesMerged['TIMES'] = servicesMerged['TIMES'].fillna(15)

   Ref_ID      Name  Ref_Id  TIMES
0       1    Mowing     1.0   30.0
1       2    Raking     2.0   15.0
2       3   Blowing     3.0   10.0
3       4  Trimming     NaN   15.0

NB: If you were to have the Ref_Id column name to match in both tables (either both Ref_Id or both Ref_ID) you could then just use on='Ref_Id' instead of both left and right on, and then you wouldn't get the second Ref_Id column in the output.

Emi OB
  • 2,814
  • 3
  • 13
  • 29
  • Thank you so much for this. I fully understand what you're saying here (at least I'm sure I do), however, after doing a how='left' I am still missing the 'Trimming'. I not only don't get the Trimming with NaN. I simply don't get a Trimming in the results. Therefore I can't move onto the step to .fillna() as there is no data in the dataframe with an NaN to fill. I have no idea why it's doing this, but it's just not adding the Trimming as, what I had assumed was, due to the time value not being present. I'm stumped! – robster Oct 13 '22 at 22:02
  • aaaand I do apologise. It DID work. I realised I was performing a second merge later, with another data set (prices) and it was doing an outer merge, which was removing the data. But, I learned so much and I want to thank you for your time. It's highly appreciated. – robster Oct 13 '22 at 22:31
  • 1
    @robster we've all been there :p glad you've got it sorted – Emi OB Oct 14 '22 at 15:02