2

close but not a duplicate: Proper idiom for adding zero count rows in tidyr/dplyr - I am trying to fill based on existing values in df, but also fill based on data that does not have an id. Similar, but fundamentally different.

For each id, I am trying to make sure each has 3 billing months.

Ideally, for each id I need all three required months to be present in df_complete. If it is not in the data, I am looking to add a row with "Not found" for the values.

Additionally, I want to check against all_ids and add ids that are in all_ids but do not have rows in df

months <- as.data.frame(as.Date(c("2016/7/1","2016/9/1","2016/7/1", "2016/8/1","2016/9/1", "2016/8/1","2016/9/1"))) 
id <- as.data.frame(c("a","a","b","b","b","c","c"))
value <- as.data.frame(c(1,2,3,4,5,6,7))
df <- cbind(id,months,value)
colnames(df) <- c("id","billing months","value")
required_months <- as.data.frame(as.Date(c("2016/7/1", "2016/8/1","2016/9/1")))
colnames(required_months)<- "required months"
all_ids <- as.data.frame(c("a","b", "c", "d"))

df ends up looking like:

id  billing months  value
a   7/1/2016    1
a   9/1/2016    2
b   7/1/2016    3
b   8/1/2016    4
b   9/1/2016    5
c   8/1/2016    6
c   9/1/2016    7

What I'm looking for (df_complete):

id  billing months  value
a   7/1/2016    1
a   8/1/2016    Not Found
a   9/1/2016    2
b   7/1/2016    3
b   8/1/2016    4
b   9/1/2016    5
c   7/1/2016    Not Found
c   8/1/2016    6
c   9/1/2016    7
d   7/1/2016    Not Found
d   8/1/2016    Not Found
d   9/1/2016    Not Found

Looking for a dplyr solution, but other packages work too.

longlivebrew
  • 301
  • 3
  • 16
  • 3
    Check function `complete` from `tidyr`. – AntoniosK Mar 29 '18 at 15:45
  • I also think it's better if you use `required_months <- as.Date(c("2016/7/1", "2016/8/1","2016/9/1"))` and `all_ids <- c("a","b", "c", "d")` as vectors instead of data frames, so they can be used directly within `complete` function. – AntoniosK Mar 29 '18 at 15:57
  • For reference, a similar question: https://stackoverflow.com/q/25956178/ – Frank Mar 29 '18 at 16:11

1 Answers1

5

This looks like a job for tidyr::complete. As you are missing both id variables and months in your original dataset, you'll need to define the values you need filled in via complete. You define what you want to the missing values entered as with fill (although your Not found value will change your column from one that was potentially a column of numbers to a column of characters).

suppressPackageStartupMessages( library(dplyr) )
library(tidyr)

df %>%
    complete(id = c("a","b", "c", "d"), 
             `billing months` = required_months$`required months`,
             fill = list(value = "Not found") )

#> Warning: Column `id` joining character vector and factor, coercing into
#> character vector

#> # A tibble: 12 x 3
#>    id    `billing months` value    
#>    <chr> <date>           <chr>    
#>  1 a     2016-07-01       1        
#>  2 a     2016-08-01       Not found
#>  3 a     2016-09-01       2        
#>  4 b     2016-07-01       3        
#>  5 b     2016-08-01       4        
#>  6 b     2016-09-01       5        
#>  7 c     2016-07-01       Not found
#>  8 c     2016-08-01       6        
#>  9 c     2016-09-01       7        
#> 10 d     2016-07-01       Not found
#> 11 d     2016-08-01       Not found
#> 12 d     2016-09-01       Not found

Created on 2018-03-29 by the reprex package (v0.2.0).

aosmith
  • 34,856
  • 9
  • 84
  • 118
  • OP has an object all_ids you can use instead of typing the vector out again `tidyr::complete(df, id = all_ids[[1]], \`billing months\` = required_months[[1]], fill = list(value = "nope"))` – Frank Mar 29 '18 at 16:01
  • 1
    @Frank Thanks. I realize that, but because I could copy and paste the vector the OP had written out I lazily chose that route rather than typing out an additional object name. :) – aosmith Mar 29 '18 at 16:15