0

I have this short data frame:

population.served <- c(200:210)
area <- c("Cambridge", "Oxford","Cambridge", "Oxford", "Cambridge", "Oxford","London","Cambridge", "Oxford", "London","Edinburgh")
year <- c("Year.1", "Year.1","Year.2", "Year.2","Year.3", "Year.3","Year.3", "Year.4", "Year.4","Year.4","Year.4" )
data <- data.frame(population.served, area, year) 

How do I make it so that all area and year entries include a value for the population.served, even if they are zero values?

I want the data to look like this:

population.served <- c(200, 201, 0, 0, 202, 203, 0, 0, 204, 205, 206, 0, 207, 208, 209, 210)
area <- c("Cambridge", "Oxford","London","Edinburgh", "Cambridge", "Oxford","London","Edinburgh","Cambridge", "Oxford","London","Edinburgh","Cambridge", "Oxford","London","Edinburgh")
year <- c("Year.1", "Year.1","Year.1", "Year.1","Year.2", "Year.2","Year.2", "Year.2","Year.3", "Year.3","Year.3", "Year.3","Year.4", "Year.4","Year.4","Year.4" )
data2 <- data.frame(population.served, area, year) 
Thirst for Knowledge
  • 1,606
  • 2
  • 26
  • 43

3 Answers3

3

You can use complete from package tidyr :

library("tidyr")
data %>% complete(area, year, fill = list(population.served = 0))
# # A tibble: 16 × 3
#         area   year population.served
#       <fctr> <fctr>             <dbl>
# 1  Cambridge Year.1               200
# 2  Cambridge Year.2               202
# 3  Cambridge Year.3               204
# 4  Cambridge Year.4               207
# 5  Edinburgh Year.1                 0
# 6  Edinburgh Year.2                 0
# 7  Edinburgh Year.3                 0
# 8  Edinburgh Year.4               210
# .....
Victorp
  • 13,636
  • 2
  • 51
  • 55
2

Here's one approach, using expand.grid from base R to fill out your table:

# make a dummy table with all time steps for all units
DF <- with(data, expand.grid(area = unique(area), year = unique(year)))

# merge the data with that table, using all.x = TRUE to keep the larger set
DF <- merge(DF, data, all.x = TRUE)

# replace the NAs in the expanded data frame with 0s
DF[is.na(DF)] = 0
ulfelder
  • 5,305
  • 1
  • 22
  • 40
1

An approach with the fast data.table package:

library(data.table)
setDT(data)[CJ(area = area, year = year, unique = TRUE), on = c('area', 'year')
            ][is.na(population.served), population.served := 0][]

the result is then:

    population.served      area   year
 1:               200 Cambridge Year.1
 2:               202 Cambridge Year.2
 3:               204 Cambridge Year.3
 4:               207 Cambridge Year.4
 5:                 0 Edinburgh Year.1
 6:                 0 Edinburgh Year.2
 7:                 0 Edinburgh Year.3
 8:               210 Edinburgh Year.4
 9:                 0    London Year.1
10:                 0    London Year.2
11:               206    London Year.3
12:               209    London Year.4
13:               201    Oxford Year.1
14:               203    Oxford Year.2
15:               205    Oxford Year.3
16:               208    Oxford Year.4
h3rm4n
  • 4,126
  • 15
  • 21