Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this.
Input data:
Expected Intermediate Output:
After removing duplicates
Having dataframe with missing values in terms of date. I need to combine all the rows into one by date column. could you please help me on this.
Input data:
Expected Intermediate Output:
After removing duplicates
import pandas as pd
import numpy as np
# Creating a DataFrame with the input data
data = {
'PatientId': [680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366, 680366],
'Date': ['8/4/22 10:02', '8/4/22 10:02', '8/4/22 10:02', '8/15/22 10:04', '8/15/22 10:04',
'8/15/22 10:04', '10/21/22 12:19', '10/21/22 12:19', '10/21/22 12:19'],
'value1': [np.nan, np.nan, np.nan, 3, np.nan, np.nan, np.nan, np.nan, np.nan],
'value3': [2, np.nan, np.nan, 4, np.nan, 7, np.nan, np.nan, 7],
'value4': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 4, 4, np.nan]
}
df = pd.DataFrame(data)
# Convert the 'Date' column to datetime type
df['Date'] = pd.to_datetime(df['Date'])
# Group by 'PatientId' and 'Date', and aggregate the values by summing non-null values in each group
df_combined = df.groupby(['PatientId', 'Date']).sum(numeric_only=True, min_count=1).reset_index()
print(df_combined)
You can use same df for save your dataframe