0

I am new to programming and would appreciate your help. Trying to avoid repetition of code for querying on a pandas dataframe.

x1 is the dataframe with various column names such as Hypertension, Diabetes, Alcoholism, Handicap, Age_Group, Date_Appointment

Each of the disease column listed above contains 0 - not having disease, 2/3/4 - has different stages of disease

So when I filter on ' != 0 ' it will list records for patients with that specific disease. As such each disease will filter out different sets of records.

I wrote below query 4 times and replaced the word Hypertension with the other diseases to get 4 different graphs for each of the diseases.

But it is not clean coding. I need help to understand how any which function could be used and how to use it to write just 1 query instead of 4.

hyp1 = x1.query('Hypertension != 0')
i1 = hyp1.groupby('Age_Group')['Hypertension'].value_counts().plot(kind = 'bar',label = 'Hypertension',figsize=(6, 6))
plt.title('Appointments Missed by Patients with Hypertension')
plt.xlabel('Hypertension Age_Group')
plt.ylabel('Appointments missed');

Below is another set I don't know how to condense.

`print('Details of all  appointments')
`print('')`
`print(df.Date_Appointment.value_counts().sort_index())`
`print('')`
`print(df.Date_Appointment.describe())`
`print('')`
`print(df.Date_Appointment.value_counts().describe())`
`print('')`
`print('Median = ', (round(df.Date_Appointment.value_counts().mean())))`
`print('Median = ', (round (df.Date_Appointment.value_counts().median())))`
`print('Mode = ', (df.Date_Appointment.value_counts().mode()))`

Would appreciate your detailed response. Thank you in advance.

  • Please [provide a reproducible copy of the DataFrame with `to_clipboard`](https://stackoverflow.com/questions/52413246/provide-a-reproducible-copy-of-the-dataframe-with-to-clipboard/52413247#52413247) – Trenton McKinney Apr 21 '20 at 04:21

1 Answers1

1
  • Create a list of the desired columns
  • Iterate through them
  • Use f-strings (e.g. f'{...})
diseases = {'Hypertension': 'red', 'Diabetes': 'blue', 'Alcoholism': 'green', 'Handicap': 'yellow'}

for disease, color in diseases.items():
    subset = x1.query(f'{disease} != 0')
    i1 = subset.groupby('Age_Group')[f'{disease}'].value_counts().plot(kind='bar', label=f'{disease}', figsize=(6, 6), color=color)
    plt.title(f'Appointments Missed by Patients with {disease}')
    plt.xlabel(f'{disease} Age Group')
    plt.ylabel('Appointments missed')
    plt.show()
  • Incidentally, this would be easier with sample data to work with
  • For the second half, it's not clear what you want to condense or replace Date_Appointment with.
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • Hello, could you also please clarify if i want to add different colors in the graph for each of the diseases, how would i do it? I tried to list the codes as color = [....] before the 'for....in....' statement but couldn't figure how to assign them individually to each disease. Then I tried assigning them in the i1= ........plot(........,color = ['red', 'blue', 'green','yellow'.......) but got an error). Not sure what to do now. – user13367515 Apr 22 '20 at 04:28
  • @user13367515 I've updated `diseases` to be a `dict` with `disease` and `color`, the `for-loop` now unpacks the `dict` and add `color=color` to `.plot()`. – Trenton McKinney Apr 22 '20 at 05:34