I have a time series data set of multiple IDs and multiple variables, each variable has 3 time series entries - "baseline", "3 month", "6 month". The dataframe is structured like this, df =
import pandas as pd
data = {'Patient ID': [11111, 11111, 11111, 11111, 22222, 22222, 22222, 22222, 33333, 33333, 33333, 33333, 44444, 44444, 44444, 44444, 55555, 55555, 55555, 55555],
'Lab Attribute': ['% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)', '% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)', '% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)', '% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)', '% Saturation- Iron', 'ALK PHOS', 'ALT(SGPT)', 'AST (SGOT)'],
'Baseline': [46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0],
'3 Month': [23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0],
'6 Month': [34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0]}
df = pd.DataFrame(data)
Patient ID Lab Attribute Baseline 3 Month 6 Month
0 11111 % Saturation- Iron 46.0 23.0 34.0
1 11111 ALK PHOS 94.0 82.0 65.0
2 11111 ALT(SGPT) 21.0 13.0 10.0
3 11111 AST (SGOT) 18.0 17.0 14.0
4 22222 % Saturation- Iron 46.0 23.0 34.0
5 22222 ALK PHOS 94.0 82.0 65.0
6 22222 ALT(SGPT) 21.0 13.0 10.0
7 22222 AST (SGOT) 18.0 17.0 14.0
8 33333 % Saturation- Iron 46.0 23.0 34.0
9 33333 ALK PHOS 94.0 82.0 65.0
10 33333 ALT(SGPT) 21.0 13.0 10.0
11 33333 AST (SGOT) 18.0 17.0 14.0
12 44444 % Saturation- Iron 46.0 23.0 34.0
13 44444 ALK PHOS 94.0 82.0 65.0
14 44444 ALT(SGPT) 21.0 13.0 10.0
15 44444 AST (SGOT) 18.0 17.0 14.0
16 55555 % Saturation- Iron 46.0 23.0 34.0
17 55555 ALK PHOS 94.0 82.0 65.0
18 55555 ALT(SGPT) 21.0 13.0 10.0
19 55555 AST (SGOT) 18.0 17.0 14.0
What I'm trying to do is group by the ID and Lab Attribute and create a plot of each "Lab Attribute" - % Saturation- Iron, ALK PHOS, etc., that will include the time series for all of the Patient IDs.
So based on the example data there would be 4 plots - % Saturation- Iron, ALK PHOS, etc., that each would contain 5 traces (1 for each ID).
I tried using groupby per this article - Creating a time-series plot with data in long format in python?
Though it just plots everything on one chart.
This is the code I have so far:
df_labs = pd.read_csv("/Users/johnconor/Documents/Python/gut_microbiome/out/nw_labs_up_to_6mon.csv")
df_labs = df_labs.fillna(method='ffill')
dfl = df_labs.groupby(['Patient_ID', 'Lab_Attribute'])['Baseline','3 Month','6 Month'].sum().plot()
This was the outcome:
[![enter image description here][1]][1]
Part of the problem I'm having is that all of the examples I can find have long format data with only 1 values column. Not a number of values over time.
I also tried to utilize the approach for multiple plots per this article - Creating a time-series plot with data in long format in python?
n_ids = df_labs.Patient_ID.unique().size
n_cols = int(n_ids ** 0.5)
n_rows = n_cols + (1 if n_ids % n_cols else 0)
fig, axes = plt.subplots(n_rows, n_cols)
axes = axes.ravel()
for i, (id, att, base,three,six) in enumerate(df_labs.groupby(['Patient_ID', 'Lab_Attribute'])['Baseline','3 Month','6 Month'].sum().reset_index()):
print(idx)
series.plot(ax=axes[i], title=f"ID:{idx}")
fig.tight_layout()
Though am running into issues as it is again designed for only 1 set of values. Producing the error:
ValueError: too many values to unpack (expected 5)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-40-42cf5bc14bdb> in <module>
4 fig, axes = plt.subplots(n_rows, n_cols)
5 axes = axes.ravel()
----> 6 for i, (id, att, base,three,six) in enumerate(df_labs.groupby(['Patient_ID', 'Lab_Attribute'])['Baseline','3 Month','6 Month'].sum().reset_index()):
7 print(idx)
8 series.plot(ax=axes[i], title=f"ID:{idx}")
ValueError: too many values to unpack (expected 5)