0

I have a binary variable in my dataframe (people are older than 65 years or not) and the total amount of people for the two groups for several years.

I would like to have only one row for each year, so one column would show the amount of people older than 65 and one column the amount of people younger than 65.

How can I split my column with the binary variable up to make two columns out of it?

Thank you very much for your answer.

7Luke7
  • 15
  • 4

4 Answers4

3

Is this what you're looking for:

dat <- data.frame(over65 = rep(c(0,1), 5), 
           year = rep(2016:2020, each=2), 
           n=round(runif(10, 100, 200)))

dat
#    over65 year   n
# 1       0 2016 176
# 2       1 2016 109
# 3       0 2017 133
# 4       1 2017 142
# 5       0 2018 150
# 6       1 2018 110
# 7       0 2019 127
# 8       1 2019 138
# 9       0 2020 151
# 10      1 2020 159

dat %>% pivot_wider(names_from="over65", values_from="n", names_prefix="over65_")
# # A tibble: 5 x 3
#    year over65_0 over65_1
#   <int>    <dbl>    <dbl>
# 1  2016      176      109
# 2  2017      133      142
# 3  2018      150      110
# 4  2019      127      138
# 5  2020      151      159


DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
3

Here is a data.table approach:

# H/t to DaveArmstrong for the data!

dat <- data.frame(over65 = rep(c(0,1), 5), 
           year = rep(2016:2020, each=2), 
           n=round(runif(10, 100, 200)))

library(data.table)

setDT(dat)

dcast(dat, year ~ paste0("over65_", over65),  fun.aggregate = sum)


#>    year over65_0 over65_1
#> 1: 2016      146      159
#> 2: 2017      134      120
#> 3: 2018      164      113
#> 4: 2019      185      163
#> 5: 2020      180      114

Created on 2021-03-18 by the reprex package (v0.3.0)

Eric
  • 2,699
  • 5
  • 17
2

We can use xtabs in base R

xtabs(n ~ year + over65, dat)
akrun
  • 874,273
  • 37
  • 540
  • 662
1
library(dplyr)
dat1 <- dat %>% 
  group_by(year) %>% 
  summarize(older_65 = case_when(over65==1 ~ n),
         younger_65 = case_when(over65==0 ~ n)) %>% 
  mutate(older_65=lead(older_65)) %>% 
  na.omit()

data: borrowed from DaveArmstrong

set.seed(123)
dat <- data.frame(over65 = rep(c(0,1), 5), 
                  year = rep(2016:2020, each=2), 
                  n=round(runif(10, 100, 200)))

enter image description here

TarJae
  • 72,363
  • 6
  • 19
  • 66