1

I have the following data frame:

library(dplyr)
library(tibble)


df <- tibble(
  source = c("a", "b", "b"),
  day = c("D1", "D2", "D3"),
  score = c(10, 5, 3) ) 


df

Which looks like this:

> df
# A tibble: 3 x 3
  source day   score
  <chr>  <chr> <dbl>
1 a      D1       10
2 b      D2        5
3 b      D3        3

Now the values for source and day are incomplete. The full list of source and day is stored as vectors:

complete_source <- c("a", "b","c")
complete_day <- c("D1", "D2" ,"D3", "D4")

What I want to do is to complete the data frame based on complete_source and complete_day, filling the value with zero (0).

The desired result is (hand made):

 source day   score
 a      D1       10
 a      D2        0
 a      D3        0
 a      D4        0
 b      D2        5
 b      D3        3
 ... etc ...
 c      D1        0
 c      D2        0
 c      D3        0
 c      D4        0
 ...etc

How can I achieve that?

littleworth
  • 4,781
  • 6
  • 42
  • 76

1 Answers1

1

We can use complete

library(tidyr)
library(dplyr)
complete(df, source = complete_source, day = complete_day, fill = list(score = 0))
# A tibble: 12 x 3
#   source day   score
#   <chr>  <chr> <dbl>
# 1 a      D1       10
# 2 a      D2        0
# 3 a      D3        0
# 4 a      D4        0
# 5 b      D1        0
# 6 b      D2        5
# 7 b      D3        3
# 8 b      D4        0
# 9 c      D1        0
#10 c      D2        0
#11 c      D3        0
#12 c      D4        0

Or do a crossing with the vectors and join

crossing(source = complete_source, day = complete_day) %>% 
        left_join(df) %>%
        mutate(score = replace_na(score, 0))

In base R, this can be done with expand.grid/merge

transform(merge(expand.grid(source = complete_source, 
      day = complete_day), df, all.x = TRUE), 
      score = replace(score, is.na(score), 0))
akrun
  • 874,273
  • 37
  • 540
  • 662