1

I have a dataset with null values in the column 'revenues_from_appointment'

Dataset

appointment_date patient_id practitioner_id appointment_duration_min revenues_from_appointment
2021-06-28 42734 748 30 90.0
2021-06-29 42737 747 60 150.0
2021-07-01 42737 747 60 NaN
2021-07-03 42736 748 30 60.0
2021-07-03 42735 747 15 42.62
2021-07-04 42734 748 30 NaN
2021-07-05 42734 748 30 100.0
2021-07-10 42738 747 15 50.72
2021-08-12 42739 748 30 73.43

I wish to replace NULL values by the mean value of rows where "patient_id, practitioner_id, appointment_duration_min" is the same.

I did it using pandas dataframe,

df['revenues_from_appointment'].fillna(df.groupby(['patient_id','practitioner_id','appointment_duration_min'])['revenues_from_appointment'].transform('mean'), inplace = True)

How can we obtain the same result by using SQL?

Final Output

appointment_date patient_id practitioner_id appointment_duration_min revenues_from_appointment
2021-06-28 42734 748 30 90.0
2021-06-29 42737 747 60 150.0
2021-07-01 42737 747 60 150.0
2021-07-03 42736 748 30 60.0
2021-07-03 42735 747 15 42.62
2021-07-04 42734 748 30 95.0
2021-07-05 42734 748 30 100.0
2021-07-10 42738 747 15 50.72
2021-08-12 42739 748 30 73.43
  • mysql and sql server are two different database products with different sql syntax. Which one do you use? – Shadow Jun 01 '22 at 10:05
  • I am trying on ms-sql – Omkar Salokhe Jun 01 '22 at 10:07
  • I had a similar issue, used python to replace the NULL values as shown in this link. https://stackoverflow.com/questions/18689823/pandas-dataframe-replace-nan-values-with-average-of-columns – JonTout Jun 01 '22 at 10:37

1 Answers1

0

You can use the AVG window function, that will partition on the three column of interest and replace null values using the COALESCE function:

SELECT appointment_date,
       patient_id,
       practitioner_id,
       appointment_duration_min,
       COALESCE(revenues_from_appointment, 
                AVG(revenues_from_appointment) OVER(PARTITION BY patient_id, 
                                                                 practitioner_id, 
                                                                 appointment_duration_min))
FROM tab

Try it here.

lemon
  • 14,875
  • 6
  • 18
  • 38