My data is in a JSON file. This is how it is organized:
"summary": {
"file_count": 2
},
"primary_site": "stomach",
"disease_type": "acid reflux",
"project": {
"project_id": "Pro123"
},
"diagnoses": [
{
"primary_diagnosis": "GERD"
}
],
"demographic": {
"ethnicity": "not hispanic or latino",
"gender": "female",
"race": "Unknown"
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4236 entries, 0 to 4235
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 summary 4236 non-null object
1 primary_site 4236 non-null object
2 disease_type 4236 non-null object
3 project 4236 non-null object
4 diagnoses 4236 non-null object
5 demographic 4236 non-null object
dtypes: object(6)
I'm wanting to group by disease type and gender but gender is a subgroup. How would I identify the subgroup in the command?
df2 = df.groupby('disease_type')['gender'].
print(df2)