-1

I am building an agegroup variable. if later when I make table, and I want all categories to show, should I make it a factor? How whould I do it.

My codes are:

df<-df %>%
mutate(AGEGROUP =cut (AGE,
                                         breaks=c(-Inf,0,0.001, 0.082,1.99,12.999,64.999,200),
                                         right=TRUE,
                                         labels = c("Foetus(0 yr)",
                                                    "Neonate (0.001 - 0.082 yr)",
                                                    "Infant(0.083-1.999 yrs)",
                                                    "Child(2-12.999 yrs)",
                                                    "Adolescent(13-17.999 yrs)",
                                                    "Adult(18-64.999 yrs.)",
                                                    "Elderly(65-199 yrs)")
                                         ))

df<-df %>%
  group_by(AGEGROUP) %>%
  summarise("peopel count" = n())

Right now, if i only have people in two catergories, it only show two, like

enter image description here

I want to have sth that looks like this:

enter image description here

Stataq
  • 2,237
  • 6
  • 14

1 Answers1

1

Use count with .drop = FALSE :

library(dplyr)

df %>%
  mutate(AGEGROUP = cut(AGE,
                        breaks=c(-Inf,0,0.001, 0.082,1.99,12.999,64.999,200),
                        right=TRUE,
                        labels = c("Foetus(0 yr)",
                                   "Neonate (0.001 - 0.082 yr)",
                                   "Infant(0.083-1.999 yrs)",
                                   "Child(2-12.999 yrs)",
                                   "Adolescent(13-17.999 yrs)",
                                   "Adult(18-64.999 yrs.)",
                                   "Elderly(65-199 yrs)")
  )) %>%
  count(AGEGROUP, name = 'people_count', .drop = FALSE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Do you think `count` is a better way than `group_by(AGEGROUP) %>% summarise("peopel count" = n())`? right now i always use `summarise` to count number. whether I should change? – Stataq Mar 14 '21 at 02:42
  • 1
    If you only want `n()` then `count` is better. If there is something else that you want to do after `group_by(AGEGROUP)` then you can continue with `summarise("peopel count" = n())`. – Ronak Shah Mar 14 '21 at 03:01
  • if I merge two data that get from `count`, how can I keep it ordered as agegroup level? right now each one is in right order, but after merge, it changed its order. anyway to solve this issue? – Stataq Mar 14 '21 at 03:38
  • 1
    After merge you need to `arrange` the data. If you want data in specific order this might help : https://stackoverflow.com/questions/11977102/order-data-frame-rows-according-to-vector-with-specific-order – Ronak Shah Mar 14 '21 at 03:44