-1

I'm having difficulty creating a new factor variable from a preexisting numerical variable. I have a numerical variable Age with the age of my participants but want to create a factor variable that categorizes participants' age into different categories. Whenever I run my code I get an error:

"Error: argument "no" is missing, with no default."

I have tried different variations of the below code such as the new factor level without quotes, using : for ranges, etc. My code is below.

data.frame%>%
    mutate(Age = ifelse(Age < 20, "0"),
           ifelse(Age >= 20 & Age <= 29, "1"),
                  ifelse(Age >=30 & Age <= 39, "2"),
                        ifelse(Age >= 40 & Age <=49, "3"),
                               ifelse(Age >= 50 & Age <= 59, "4"),
                                     ifelse(Age >= 60 & Age <= 69, "5"),
                                           ifelse(Age >= 70, "6", NA))
Sathish
  • 12,453
  • 3
  • 41
  • 59
Austin
  • 153
  • 2
  • 11
  • 1
    Possible duplicate of [Group numeric values by the intervals](http://stackoverflow.com/questions/13559076/group-numeric-values-by-the-intervals) – Ronak Shah Mar 13 '17 at 01:15

2 Answers2

5

cut() is the easiest way to do this.

In base R:

Age <- seq(10,80,by=10)
cut(Age,breaks=c(-Inf,seq(20,70,by=10),Inf),
        right=FALSE,
        labels=as.character(0:6))

I'll leave you to embed this in mutate() as you like.

The problem with your code is that you don't have the choices nested properly: compare this snippet carefully to your code ...

Age = ifelse(Age < 20, "0",
         ifelse(Age >= 20 & Age <= 29, "1",
            ifelse(...,[yes],[no])))
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
3

The end brackets ")" should go to the end of all ifelse:

df1 <- data.frame(Age=c(1:80,NA))

df1%>%
    mutate(Age_cat = factor(ifelse(Age < 20, "0",
           ifelse(Age >= 20 & Age <= 29, "1",
                  ifelse(Age >=30 & Age <= 39, "2",
                        ifelse(Age >= 40 & Age <=49, "3",
                               ifelse(Age >= 50 & Age <= 59, "4",
                                     ifelse(Age >= 60 & Age <= 69, "5",
                                           ifelse(Age >= 70, "6", NA)))))))))

However, you should also know that in dplyr, this is the perfect opportunity for case_when:

df1 %>%
mutate(Age_cat= factor(case_when(
  .$Age <  20 ~ "0",
  .$Age >= 20 & .$Age <= 29 ~ "1",
  .$Age >= 30 & .$Age <= 39 ~"2",
  .$Age >= 40 & .$Age <=49 ~  "3",
  .$Age >= 50 & .$Age <= 59 ~ "4",
  .$Age >= 60 & .$Age <= 69 ~ "5",
  TRUE  ~"6"))
)
   Age Age_cat
1    1       0
2    2       0
3    3       0
4    4       0
5    5       0
...
13  13       0
14  14       0
15  15       0
16  16       0
17  17       0
18  18       0
19  19       0
20  20       1
21  21       1
22  22       1
23  23       1
24  24       1
...
79  79       6
80  80       6
81  NA    <NA>
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
  • Another thing: since the arguments of `case_when` are evaluated in order, I think it's sufficient to write `.$Age < 20 ~ "0", .$Age <= 29 ~ "1"` and so on. – Scarabee Mar 13 '17 at 01:10
  • You are right on both counts. I changed the `TRUE ~"6"` but I'll leave it to OP to change it to `.$Age <= 29 ~ "1"` if he wishes to. – Pierre Lapointe Mar 13 '17 at 01:12