import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv("G:\learning python\medical-data visualizer/medical_examination.csv")
df["overweight"] = (df["weight"]/pow(df["height"]/100, 2) > 25).astype(int)
df["cholesterol"] = (df["cholesterol"] > 1).astype(int)
df["gluc"] = (df["gluc"] > 1).astype(int)
df_cat = pd.melt(df, id_vars =["cardio"], value_vars = ["cholesterol", "gluc", "smoke", "alco", "active", "overweight"])
df_cat = df_cat.groupby(['cardio','variable','value']).size()
print(df_cat)
This is my series:
cardio variable value
0 active 0 6378
1 28643
alco 0 33080
1 1941
cholesterol 0 29330
1 5691
gluc 0 30894
1 4127
overweight 0 15915
1 19106
smoke 0 31781
1 3240
1 active 0 7361
1 27618
alco 0 33156
1 1823
cholesterol 0 23055
1 11924
gluc 0 28585
1 6394
overweight 0 10539
1 24440
smoke 0 32050
1 2929
I'd like to convert it into a dataframe with column names cardio, variable, value and total, for the last unnamed column in the series. I tried using .to_frame(), but the dataframe takes only 1 column name and thus I cant put all the four column names correctly. How can I do this? Thanks in advance!