Impute missing values with group by in pandas

Question

Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this.

Input data:

Expected Intermediate Output:

After removing duplicates

What are you expecting if there are conflicting values? i.e. if `value2` for `8/15/22 10:04` has both `7` and `5`? — Shorn, Jun 19 '23 at 04:16
No. The dates are different. I just want to fill based on dates. — balams, Jun 19 '23 at 04:27
In your example you have two values for `value2` on `8/15/22 10:04` which are of the same value. I am asking what you are expecting if they are the same value. Also, what have you tried to accomplish this so far? — Shorn, Jun 19 '23 at 04:28
I expect same value which is 7. Actually have some medical records. On a same day they enter lab results in different time periods. I just want to consolidate all the values taken on same day with minimal missing values — balams, Jun 19 '23 at 04:32
`df.groupby("Date").agg(lambda x: x.dropna().tolist()).reset_index()`. You can then say, get the first or the last element of the list to get the first and last el of the list. — Yubo, Jun 19 '23 at 04:33

score 1 · Answer 1 · answered Jun 19 '23 at 04:29

import pandas as pd
import numpy as np

# Creating a DataFrame with the input data
data = {
    'PatientId': [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
    'Date': ['8/4/22 10:02', '8/4/22 10:02', '8/4/22 10:02', '8/15/22 10:04', '8/15/22 10:04',
             '8/15/22 10:04', '10/21/22 12:19', '10/21/22 12:19', '10/21/22 12:19'],
    'value1': [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
    'value3': [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
    'value4': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
}

df = pd.DataFrame(data)

# Convert the 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])

# Group by 'PatientId' and 'Date', and aggregate the values by summing non-null values in each group
df_combined = df.groupby(['PatientId', 'Date']).sum(numeric_only=True, min_count=1).reset_index()

print(df_combined)

You can use same df for save your dataframe

Thanks for your support. These are lab results so i couldn't able to sum. I reedit my code. I add my expected intermediate table. may be is this give clear picture — balams, Jun 19 '23 at 04:51

Impute missing values with group by in pandas

1 Answers1