4

I am needing to create a list for patients in a df that classifies them as 'high', 'medium', or 'low' depending on their BMI and if they smoke. When I current run the code, I am getting 'Medium' for all six entries. (Pseudo names and data have been used)

df = pd.DataFrame({'Name':['Jordan', 'Jess', 'Jake', 'Alice', 'Alan', 'Lauren'],
                   'Age':[26, 23, 19, 20, 24, 28],
                   'Sex':['M', 'F' , 'M', 'F', 'M', 'F'],
                   'BMI':[26, 22, 24, 17, 35, 20],
                   'Smokes':['No', 'No', 'Yes', 'No', 'Yes', 'No']})


risk_list = []

for i in df.Name:
  if df.BMI.any() > 30 | df.BMI.any() < 19.99 | df.Smokes.any() == "Yes":
    risk_list.append("High")
  elif df.BMI.any() >= 25 & df.BMI.any() <= 29.99:
    risk_list.append("Medium")
  elif df.BMI.any() < 24.99 & df.BMI.any() > 19.99 and df.Smokes.any() == "No":
    risk_list.append("Low")

print(risk_list)

Output:

['Medium', 'Medium', 'Medium', 'Medium', 'Medium', 'Medium']

I am new to pandas and python for that matter. I think I am close but cannot figure out why my data is not being returned correctly.

Thanks.

3 Answers3

4

There are a lot of things in your codes. Just to name a few:

  1. You need several parentheses: df.BMI.any() > 30 | df.BMI.any() < 19.99 should be (df.BMI.any() > 30) | (df.BMI.any() < 19.99)

  2. & is different from and

  3. everything inside the loop, e.g. df.BMI.any() is independent from the row you are looking at, i.e. Name, so you would get the same values everywhere.

I think you can use np.select:

np.select([df.BMI.gt(30) | df.BMI.lt(19.99) | df.Smokes.eq('Yes'),
           df.BMI.between(25,29.99)],
          ['High', 'Medium'], 'Low')

Output:

array(['Medium', 'Low', 'High', 'High', 'High', 'Low'], dtype='<U6')
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
2

In addition to @QuangHoang's answer, iterating over a dataframe is somewhat intuitive. You use .iterrows(), not your Name column because this isn't a dictionary.

risk_list = []

for _, i in df.iterrows():
    if i.BMI > 30 or i.BMI < 19.99 or i.Smokes == "Yes":
        risk_list.append("High")
    elif i.BMI >= 25 and i.BMI <= 29.99:
        risk_list.append("Medium")
    elif i.BMI < 24.99 and i.BMI > 19.99 and i.Smokes == "No":
        risk_list.append("Low")

>>> print(risk_list)
    ['Medium', 'Low', 'High', 'High', 'High', 'Low']
Camilo Martinez M.
  • 1,420
  • 1
  • 7
  • 21
  • 4
    `iterrows` can be a convenient solution, but it [suffers from performance issues](https://stackoverflow.com/questions/24870953/does-pandas-iterrows-have-performance-issues). Using `.apply` would be better – Derek O May 10 '21 at 18:23
  • 2
    Yup, I would go with `np.select` anytime just for that reason alone, but since he's new to Python and pandas and was confused with how to access values in a dataframe, this is a good starting point. – Camilo Martinez M. May 10 '21 at 18:26
2

You can define this as a function and pass it to .apply():

def risk_eval(row):
  if row.BMI > 30 | row.BMI < 19.99 | row.Smokes== "Yes":
    return "High"
  elif row.BMI >= 25 & row.BMI <= 29.99:
    return"Medium"
  elif rowBMI < 24.99 & row.BMI> 19.99 and row.Smokes == "No":
    return "Low"

df['Risk'] = df.apply(lambda x: risk_eval(x),axis=1)

And then get the list with:

df['Risk'].values.tolist()
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • 2
    In my opinion, this answer is the most intuitive and I think is a good way to learn about using `apply` for someone relatively new to pandas – Derek O May 10 '21 at 18:30
  • 1
    Agreed and thanks, I was origunally thinking of `np.where()` but there are so many conditions I thought this would be more straightforward – Celius Stingher May 10 '21 at 18:36