I have a spreadsheet with hundreds of variables, that I need to visualize. To do so I'm using df.melt() to convert the dataset to long format. Which is working well to visualize all the variables in subplots.
The only problem is that it creates a lot of charts. I'm wondering if there is a way to "melt" only a certain number of variables at a time, something that would allow you to select which columns you want to melt in the dataframe. For example (pseudo code):
df.melt(id_vars=[x,y,z], var_name = 'Metric', cols = [4:15])
I know "cols =" is not a valid argument but is there something I could do similar to that to just visualize chunks of the metrics at a time?
Here is an example of my existing code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
pd.set_option("display.max_columns", 200)
data = {'PTID': [11111, 11111, 11111, 11111, 22222, 22222, 22222, 22222, 33333, 33333, 33333, 33333, 44444, 44444, 44444, 44444, 55555, 55555, 55555, 55555],
'Period' : ['Baseline','p1','p2','p3','Baseline','p1','p2','p3','Baseline','p1','p2','p3', 'Baseline','p1','p2','p3', 'Baseline','p1','p2','p3']
'ALK PHOS': [46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0, 46.0, 94.0, 21.0, 18.0],
'AST (SGOT)': [23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0, 23.0, 82.0, 13.0, 17.0],
'% Saturation- Iron': [34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0, 34.0, 65.0, 10.0, 14.0]}
dfm = df.melt(id_vars=['PTID','Period'], var_name='Metric',value_name='Value')
dfg = dfm.groupby(['PTID','Name','Period', 'Metric'])['Value'].mean().reset_index()
p = sns.relplot(data=dfg, col='Metric', x='Period', y='Value', kind='scatter', hue = 'Name',col_wrap=5, marker='o', palette='tab10',facet_kws={'sharey': False, 'sharex': True},)
p.map(sns.lineplot, 'Period', 'Value', linestyle='--', color='gray', ci = None)