0

Is it possible to create a density plot using this population data? Age_group is a categorical variable. Does it have to be numeric to create a density plot?

library(tidyverse)

df <- structure(list(year = c(1971, 1971, 1971, 1971, 1971, 1971, 1971, 
                        1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971, 1971
), age_group = structure(2:19, .Label = c("All ages", "0 to 4 years", 
                                          "5 to 9 years", "10 to 14 years", "15 to 19 years", "20 to 24 years", 
                                          "25 to 29 years", "30 to 34 years", "35 to 39 years", "40 to 44 years", 
                                          "45 to 49 years", "50 to 54 years", "55 to 59 years", "60 to 64 years", 
                                          "65 to 69 years", "70 to 74 years", "75 to 79 years", "80 to 84 years", 
                                          "85 to 89 years", "90 to 94 years", "95 to 99 years", "100 years and over", 
                                          "Median age"), class = "factor"), population = c(1836149, 2267794, 
                                                                                           2329323, 2164092, 1976914, 1643264, 1342744, 1286302, 1284154, 
                                                                                           1252545, 1065664, 964984, 785693, 626521, 462065, 328583, 206174, 
                                                                                           101117)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                                                             -18L))
ixodid
  • 2,180
  • 1
  • 19
  • 46
  • Do you have an example in mind of what a density plot of a categorical variable would look like? – camille Oct 04 '19 at 03:08

1 Answers1

1

You can convert the text to numeric ranges, e.g.:

library(tidyverse) # if not already loaded

df %>%
  # These extract the 1st and 3rd "word" of age_group
  # Uses stringr::word(), loaded as part of tidyverse
  mutate(age_min = word(age_group, 1) %>% as.numeric,
         age_max = word(age_group, 3) %>% as.numeric) %>%
  head
# A tibble: 6 x 5
   year age_group      population age_min age_max
  <dbl> <fct>               <dbl>   <dbl>   <dbl>
1  1971 0 to 4 years      1836149       0       4
2  1971 5 to 9 years      2267794       5       9
3  1971 10 to 14 years    2329323      10      14
4  1971 15 to 19 years    2164092      15      19
5  1971 20 to 24 years    1976914      20      24
6  1971 25 to 29 years    1643264      25      29 

From that, you could display in ggplot a bunch of ways:

...  %>%
ggplot(aes(age_numeric, population)) + 
  geom_step()

enter image description here

...  %>%
ggplot(aes(age_numeric, population)) + 
  geom_col()

enter image description here

...  %>%
ggplot(aes(age_numeric, y = population)) + 
  geom_density(stat = "identity")

enter image description here

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • It would help if the answer was [fully reproducible](https://stackoverflow.com/q/5963269/3277821). For example, it's not clear where the functions you're calling are from. – sboysel Oct 04 '19 at 04:09
  • 1
    OP loaded `tidyverse`, I assume the answers are attempted after OP script is run. – Jon Spring Oct 04 '19 at 04:37