0

Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this.

Input data:

enter image description here

Expected Intermediate Output:

enter image description here

After removing duplicates

enter image description here

balams
  • 85
  • 5
  • What are you expecting if there are conflicting values? i.e. if `value2` for `8/15/22 10:04` has both `7` and `5`? – Shorn Jun 19 '23 at 04:16
  • No. The dates are different. I just want to fill based on dates. – balams Jun 19 '23 at 04:27
  • In your example you have two values for `value2` on `8/15/22 10:04` which are of the same value. I am asking what you are expecting if they are the same value. Also, what have you tried to accomplish this so far? – Shorn Jun 19 '23 at 04:28
  • I expect same value which is 7. Actually have some medical records. On a same day they enter lab results in different time periods. I just want to consolidate all the values taken on same day with minimal missing values – balams Jun 19 '23 at 04:32
  • `df.groupby("Date").agg(lambda x: x.dropna().tolist()).reset_index()`. You can then say, get the first or the last element of the list to get the first and last el of the list. – Yubo Jun 19 '23 at 04:33

1 Answers1

1

import pandas as pd
import numpy as np

# Creating a DataFrame with the input data
data = {
    'PatientId': [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
    'Date': ['8/4/22 10:02', '8/4/22 10:02', '8/4/22 10:02', '8/15/22 10:04', '8/15/22 10:04',
             '8/15/22 10:04', '10/21/22 12:19', '10/21/22 12:19', '10/21/22 12:19'],
    'value1': [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
    'value3': [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
    'value4': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
}

df = pd.DataFrame(data)

# Convert the 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])

# Group by 'PatientId' and 'Date', and aggregate the values by summing non-null values in each group
df_combined = df.groupby(['PatientId', 'Date']).sum(numeric_only=True, min_count=1).reset_index()

print(df_combined)

You can use same df for save your dataframe
Shanu
  • 124
  • 4
  • Thanks for your support. These are lab results so i couldn't able to sum. I reedit my code. I add my expected intermediate table. may be is this give clear picture – balams Jun 19 '23 at 04:51