0

I'm new to R and trying to populate a column bmi_cat in my data frame dat based on the numerical value provided in the bmi column. However, it populates all of bmi_cat with "Normal" even when that is incorrect. The last bmi value in the dataframe is within the Normal range, so I suspect it is continuously updating the entirety of bmi_cat with the most recent result. However, I'm not sure why. Can anyone point out the fault in my approach?

for (num in 1:nrow(dat)){
  if (dat$bmi[num] <= 18.5) {
      dat$bmi_cat[num] <- "Underweight"
  }  else if (dat$bmi[num] > 18.5 & dat$bmi[num] <= 25) {
      dat$bmi_cat[num] <- "Normal" 
  }  else if (dat$bmi[num] > 25 & dat$bmi[num] < 30) {
      dat$bmi_cat[num] <- "Overweight"    
  }  else if (dat$bmi[num] >= 30){
      dat$bmi_cat[num] <- "Obese"
  }
}

I hope this was enough information. Thank you in advance.

clmil
  • 1
  • You can use `ifelse` or even `cut` to do this in fast and vectorised way https://stackoverflow.com/questions/13559076/convert-continuous-numeric-values-to-discrete-categories-defined-by-intervals?noredirect=1&lq=1 – Ronak Shah Oct 15 '20 at 06:35

1 Answers1

0

The way I see it, the use of loops in R is not frequently required. For your particular case, an indexing approach using which may work just fine and is a lot easier. Check this out and let me know if it works:

index_underweight <- which(dat$bmi<= 18.5)
index_normal <- which(dat$bmi > 18.5 & dat$bmi <= 25)
index_overweight <- which(datäbmi > 25 & dat$bmi < 30)
index_obese <- which(dat$bmi >= 30)

And now, use these indexes to populate the bmi_cat column.

dat$bmi_cat[index_underweight] <- "Underweight"
dat$bmi_cat[index_normal] <- "Normal"
dat$bmi_cat[index_overweight] <- "Overweight"
dat$bmi_cat[index_obese] <- "Obese"

I am assuming that the column bmi_cat of the dataframe is already created

Javier
  • 427
  • 2
  • 11