0

My data has two categorical variables:

  • 5 function
  • 35 diploma

df = pd.DataFrame({'function': ['nurse', 'doctor', 'paediatric_nurse','kitchen_staff','surgeon'], 'diploma': ['nurse_schoolA', 'nurse_schoolB, ...'nurse_school M, doctor_schoolA'....]})

For each function, I want a graph showing a count of each diploma.

ax=sns.catplot(x='ldiploma',kind='count',data=df,orient="h", col='function')
ax.fig.autofmt_xdate()

Is there a way to limit the data shown for each function to only the diploma for which there is data in this category?

I tried the following, which results in an error message that df is not recognized

ax=sns.catplot(x='diploma',kind='count',data=df.query("df['diploma'].count()>0"),orient="h", col='function')
ax.fig.autofmt_xdate()

enter image description here

df.head() df.heqd() diploma

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Lysis90
  • 67
  • 9
  • Can you share a sample of your data ? and showing the full error message ? – Alexandre B. Jul 30 '19 at 15:06
  • --------------------------------------------------------------------------- KeyError Traceback (most recent call last) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\computation\scope.py in resolve(self, key, is_local) 180 if self.has_resolvers: --> 181 return self.resolvers[key] 182 ~\AppData\Local\Continuum\anaconda3\lib\collections\__init__.py in __missing__(self, key) 905 def __missing__(self, key): --> 906 raise KeyError(key) 907 KeyError: 'df' – Lysis90 Jul 30 '19 at 15:12
  • The error comes from `data=df.query("df['diploma'].count()>0")`. Python understands `df` as a column name of your dataframe. Sharing a sample of the data will help to understand how it is structured. Also, it's better to *edit* the question instead of answering long piece of code/error in comments. – Alexandre B. Jul 30 '19 at 15:24
  • 1
    Thank you, I am quite a novice here :) – Lysis90 Jul 30 '19 at 15:30
  • You might try to define the count per diploma in a variable: `d = df['diploma'].count()` and then plotting `data = d[d > 0]` ? Without data sample, it's hard to guess. I advice you to have a look at [*How to make good reproducible pandas examples*](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Alexandre B. Jul 30 '19 at 15:44
  • that doesn't work - then d would be a constant. I need d to varry depending on the diploma category and the function – Lysis90 Jul 30 '19 at 15:50
  • 1
    Can you share the result of `df.head(20)` ? – Alexandre B. Jul 30 '19 at 15:57

0 Answers0