Pivoting selected rows into new columns

Question

I am working with census data in a region. I have the total, urban, and rural populations of each municipality in the region, identified by a municipality code. In my data frame, the three population types are "stacked" in one population variable, such that rows containing municipality codes store total populations but not urban or rural ones:

location	population	mun_code
municipality_1	10000	1
urban	6000	1
rural	4000	1
municipality_2	15000	2
urban	10000	2
rural	5000	2

I want to construct a second data frame with one variable for each population type:

location	total_pop	urban_pop	rural_pop	mun_code
municipality_1	10000	6000	4000	1
municipality_2	15000	10000	5000	2

I have tried using pivot_wider() from package tidyr, but can't seem to get the table to look the way I want.

akrun · Accepted Answer · 2023-03-05T20:36:26.483

We could create a new column based on the 'municipality' substring in the location and reshape to 'wide' format with pivot_wider

library(dplyr)
library(tidyr)
library(stringr)
df1 %>% 
  mutate(grp = case_when(str_detect(location, 'municipality') ~'total_pop', 
  TRUE ~ str_c(location, '_pop')), 
  location = case_when(grp == 'total_pop' ~ location)) %>% 
  fill(location) %>%
  pivot_wider(names_from = grp, values_from = population)

-output

# A tibble: 2 × 5
  location       mun_code total_pop urban_pop rural_pop
  <chr>             <int>     <int>     <int>     <int>
1 municipality_1        1     10000      6000      4000
2 municipality_2        2     15000     10000      5000

data

df1 <-structure(list(location = c("municipality_1", "urban", "rural", 
"municipality_2", "urban", "rural"), population = c(10000L, 6000L, 
4000L, 15000L, 10000L, 5000L), mun_code = c(1L, 1L, 1L, 2L, 2L, 
2L)), class = "data.frame", row.names = c(NA, -6L))

switched a bit of that code around but it worked! thanks for the easy fix — megsruppUNBC, Mar 05 '23 at 20:16

score 0 · Answer 2 · answered Mar 05 '23 at 20:34

Using replace and paste to get the names before pivot_wider

library(dplyr)
library(tidyr)

df %>% 
  mutate(loc = paste0(replace(location, grepl("municipality", location), "total"), "_pop"),
         location = paste0("municipality_", mun_code)) %>% 
  pivot_wider(names_from = loc, values_from = population)
# A tibble: 2 × 5
  location       mun_code total_pop urban_pop rural_pop
  <chr>             <int>     <int>     <int>     <int>
1 municipality_1        1     10000      6000      4000
2 municipality_2        2     15000     10000      5000

jay.sf · Answer 3 · 2023-03-06T06:27:28.993

Your problem is, your long format data isn't coded properly: 1. the municipality_ levels in location shouldn't be named differently depending on location, call them total as they are already identified in mun_code, and 2. if you want mun_code also as a factor, then make one.

muns <- grep('municipality_', dat$location)
dat$location[muns] <- 'total'

dat$municipality <- factor(dat$mun_code, labels=paste0('municipality_', unique(dat$mun_code)))

dat
#   location population mun_code   municipality
# 1    total      10000        1 municipality_1
# 2    urban       6000        1 municipality_1
# 3    rural       4000        1 municipality_1
# 4    total      15000        2 municipality_2
# 5    urban      10000        2 municipality_2
# 6    rural       5000        2 municipality_2

Then it's just

reshape(dat, idvar='mun_code', timevar='location', v.names='population', direction='wide')
#   mun_code   municipality population.total population.urban population.rural
# 1        1 municipality_1            10000             6000             4000
# 4        2 municipality_2            15000            10000             5000

or whatever fancy package you want to use to reshape your data.

Data:

dat <- structure(list(location = c("municipality_1", "urban", "rural", 
"municipality_2", "urban", "rural"), population = c(10000L, 6000L, 
4000L, 15000L, 10000L, 5000L), mun_code = c(1L, 1L, 1L, 2L, 2L, 
2L)), class = "data.frame", row.names = c(NA, -6L))

Pivoting selected rows into new columns

3 Answers3

data