0

I am playing with a data frame where one of the columns display the Body Mass Index (BMI) of people, and I want to create a function that takes those BMIs and returns a column with the interpretation of those BMIs (Underweight / normal / etc).

My function takes three arguments: dataframe_name, age and BMI. (The age because interpretation varies for children).

So I try to use a nested ifelse() inside my function and the function returns a column that displays only the TRUE argument of the condition in my first ifelse(); the others appear like NA. But when I do the same code directly to my data frame, it works! Please help! I dont know what I am not seeing...

This is my function (imc stands for BMI in french), and its application to my table

        my_function = function(tableau, age, imc){

        stopifnot(age %in% colnames(tableau), imc %in% colnames(tableau))
        stopifnot(is.numeric(tableau[, age]), is.numeric(tableau[, imc]))

        interp = ifelse(tableau$age <= 18, "pas applicable pour enfant", 
                 ifelse(tableau$imc < 16.5, "dénutrition", 
                 ifelse(tableau$imc >= 16.5 & tableau$imc < 18.5, "maigreur",
                 ifelse(tableau$imc >= 18.5 & tableau$imc < 25, "corpulance normale",
                 ifelse(tableau$imc >= 25 & tableau$imc < 30, "surpoids",
                 ifelse(tableau$imc >= 30 & tableau$imc < 35, "obésité modérée",
                 ifelse(tableau$imc >= 35 & tableau$imc < 40, "obésité sévère",
                 ifelse(tableau$imc >= 40, "obésité morbide", "PA")))))))) 
        tableau = cbind(tableau, interpIMC_A = c(interp))
        }
        tab_preuve = my_function(tab_preuve, "age", "IMC")

This is how I did it without a function (and it work, while it didn't inside the function)

        interp = ifelse(tab_preuve$age <= 18, "pas applicable pour enfant", 
                 ifelse(tab_preuve$IMC < 16.5, "dénutrition", 
                 ifelse(tab_preuve$IMC >= 16.5 & tab_preuve$IMC < 18.5, "maigreur",
                 ifelse(tab_preuve$IMC >= 18.5 & tab_preuve$IMC < 25, "corpulance normale",
                 ifelse(tab_preuve$IMC >= 25 & tab_preuve$IMC < 30, "surpoids",
                 ifelse(tab_preuve$IMC >= 30 & tab_preuve$IMC < 35, "obésité modérée",
                 ifelse(tab_preuve$IMC >= 35 & tab_preuve$IMC < 40, "obésité sévère",
                 ifelse(tab_preuve$IMC >= 40, "obésité morbide", "PA")))))))) 
        tab_preuve = cbind(tab_preuve, IntIMC = c(interp))

This is the table with the result without the function and with the function

Thank you to all who wants to help me (this is driving me crazy!) PS: Sorry for my english and the long post, I hope it is clear.

hey
  • 2,643
  • 7
  • 29
  • 50
S. E.
  • 1
  • Welcome to stackoverflow! Do I understand it correctly, that the "NA" is unexpected? – hey Jan 24 '20 at 02:22
  • You can't use `$` with column names stored in variables. Use `[[` instead. That is, if you have `age <- "AGE"`, `df$age` will look for a column named `"age"`, not `"AGE"`. You need `df[[age]]` and `df[[imc]]` because `age` and `imc` are objects containing column names a strings, not literal column names. – Gregor Thomas Jan 24 '20 at 02:33
  • Also, since your `ifelse` is nested and you arrange your `imc` cutoffs in increasing order, you don't need to bother with the `>=` conditions, the `<` are enough. `...tableau[[imc]] < 16.5, "dénutrition", ifelse(tableau[[imc]] < 18.5, "maigreur", ifelse(tableau[[imc]] < 25, ...`. – Gregor Thomas Jan 24 '20 at 02:36
  • For another way to approach the problem the `cut` function might be easier, [as here](https://stackoverflow.com/a/5570360/903061). I think you can simplify to `ifelse(tableau[[age]] <= 18, "pas applicable pour enfant", cut(tableau[[imc]], breaks = c(0, 16.5, 18.5, 25, 30, 35, 40, Inf), labels = c("dénutrition", "maigreur", "corpulance normale", "surpoids", "obésité modérée", "obésité sévère", "obésité morbide")))` – Gregor Thomas Jan 24 '20 at 02:40
  • Gregor - reinstate Monica, Thank you very much for your answers, they solve my problem. I use as.character() to surround cut() for the function to return the labels correctly (otherwise it return integers instead). Thank you to all for your answers, they where really helpful! – S. E. Jan 24 '20 at 16:42

1 Answers1

1

Could you post a reprex of the data so that it's easier to work out what's the problem?

Generally though, I'd recommend using a case_when() function (https://dplyr.tidyverse.org/reference/case_when.html) from the dplyr package. It's a vectorized version of an if statement and much easier to work with because it's flat and not nested, so it's easier to spot bugs & syntax errors.

Also, if you're modifying a data frame, you can use it inside a mutate() function to make the code even more readable.

E.g.:

tableau %>% 
   mutate(imc = case_when(
                  imc < 16.5 ~ "dénutrition",
                  imc >= 1.6 & imc < 18.5 ~ "maigreur",
                  etc...)```


Adam B.
  • 788
  • 5
  • 14