1

I am trying to manipulate a wide table which represents the per cent of each household composition type within two towns to a long-form table (basically, a reverse of a pivot table).

In the long table, I would like each row to represent the household composition value for one household. So, the number of rows for each combination depends on the values provided e.g. 18 rows of (town.a, singles), 8 rows of (town.b, singles etc.). However, I just can't seem to figure out how to do this expansion based on the values in each Town column.

I have a data.frame() that looks like this:

household.data <- data.frame(household.composition= c("Singles","Couples", "Families", "Single Parents", "Sharers"),
                             town.a =c(18,29,41,3,3),
                             town.b =c(8,37,48,9,3))

enter image description here

The values under the Town A and Town B columns represent the per cent makeup of each household composition within each town.

The goal is to get from this wide summary format to a long format which multiplies the value in the Household Composition column by the numeric value within the Town A and Town B columns. So each row would represent the household composition value for one household. For example:

enter image description here

Again, I know that there must be a way to do this using the spread/gather or pivot function in tidyR. However, I just can't seem to figure out how to do this expansion given that I would like the number of rows to correspond with the per cent value.

CarlaBirdy
  • 137
  • 1
  • 12
  • Do you want there to be for example, 18 rows of (town.a, singles), 8 rows of (town.b, singles etc.? So, the number of rows for each combination depends on the values provided? – HNSKD Apr 29 '20 at 03:59
  • Hi, yes that is exactly what I am after! :) – CarlaBirdy Apr 29 '20 at 04:00

4 Answers4

4

You can get the data in long format and use uncount to replicate rows.

library(tidyr)
pivot_longer(household.data, cols = -household.composition) %>% uncount(value)

# A tibble: 199 x 2
#   household.composition name  
#   <chr>                 <chr> 
# 1 Singles               town.a
# 2 Singles               town.a
# 3 Singles               town.a
# 4 Singles               town.a
# 5 Singles               town.a
# 6 Singles               town.a
# 7 Singles               town.a
# 8 Singles               town.a
# 9 Singles               town.a
#10 Singles               town.a
# … with 189 more rows
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

You can work as follows:

  1. Convert the data from wide to long format using tidyr::pivot_longer
  2. Use lapply to apply the rep-licate function based on the number of times in value
  3. Since lapply gives results as list, use dplyr::bind_rows to bind them into a dataframe
  4. Remove the value column to get the desired output
library(dplyr)
library(tidyr)
 household.data %>% 
   pivot_longer(-household.composition, names_to = "town") %>% 
   lapply(rep, .$value) %>% 
   bind_rows() %>%
   select(-value)
UseR10085
  • 7,120
  • 3
  • 24
  • 54
HNSKD
  • 1,614
  • 2
  • 14
  • 25
1

Base R solution:

setNames(within(
  reshape(
    household.data,
    direction = "long",
    varying = grepl("town", names(household.data)),
    timevar = "town_type",
    times = NULL,
    idvar = !(grepl("town", names(household.data))),
    new.row.names = 1:(nrow(household.data) * length(grepl(
      "town", names(household.data)
    )))
  ),
  {
    rm(town)
  }
), c("household.composition", "town"))
hello_friend
  • 5,682
  • 1
  • 11
  • 15
1

data.table solution

library(data.table)
melt(setDT(household.data),id.vars = "household.composition")[rep(1:.N,value),.( household.composition,variable)]

Frank Zhang
  • 1,670
  • 7
  • 14