Is there a way to fill in missing dates with 0s using dplyr?

Question

I have a dataset like this:

id  date     value      
1   8/06/12    1         
1   8/08/12    1         
2   8/07/12    2         
2   8/08/12    1

Every id should a have a value for every date. When an id is missing a particular date, that row needs to be added with a value of 0. E.g.,

id  date     value      
1   8/06/12    1   
1   8/07/12    0      
1   8/08/12    1  
2   8/06/12    0         
2   8/07/12    2         
2   8/08/12    1

I'm trying to figure out how to add the rows with 0s. There's a good solution here: R - Fill missing dates by group. However, I can't use the tidyr::complete function because I'm using sparklyr and, as far as I know, need to stay within dplyr functions.

[Complete time-series with sparklyr](https://stackoverflow.com/q/49871925) — 10465355, Jan 23 '19 at 19:02

steveo'america · Accepted Answer · 2019-01-23T18:32:25.597

3

In sparklyr, you must use Spark functions. This is a job for coalesce. First you have to fill out all the pairs of ids and dates you expect to see, so maybe something like: (edit)

all_id <- old_data %>% distinct(id) %>% mutate(common=0)
all_date <- old_data %>% distinct(date) %>% mutate(common=0)
all_both <- all_id %>% full_join(all_date,by='common')
data <- old_data %>%
  right_join(all_both %>% select(-common),by=c('id','date')) %>%
  mutate(value=`coalesce(value,0)`)

I have assumed you have all the dates and ids you care about in your old data, though that might not be the case.

edited Jan 23 '19 at 18:32

answered Jan 23 '19 at 17:46

steveo'america

206
1
7

I can create a table of all the required dates. I'm going to try this right now. Thank you for the quick response! – Jacob Curtis Jan 23 '19 at 17:49
1

Now that I think about it, my code will not do exactly what you want; you need to right join on all of the dates and IDs... – steveo'america Jan 23 '19 at 17:54
Gotcha. I'll change that. – Jacob Curtis Jan 23 '19 at 17:56
1

`full_join` seems safer than `right_join`, just in case there are dates missing from `all_date` they won't be silently dropped. – Gregor Thomas Jan 23 '19 at 17:58
1

My sparklyr motto is: _twelfth time's a charm_. – steveo'america Jan 23 '19 at 18:16
When I test the full_join, I get an error because there's no common variables. What am I missing? – Jacob Curtis Jan 23 '19 at 18:27
1

Great solution. But I think the third line of code (full_join) will not work because `all_id` and `all_date` have no common variables. – Darren Tsai Jan 23 '19 at 18:27
I guess I can just add a common variable to do the full join. e.g., all_id <- old_data %>% distinct(id) %>% dplyr::mutate(join_placeholder = 1) all_date <- old_data %>% distinct(date) %>% dplyr::mutate(join_placeholder = 1). I'm going to try that – Jacob Curtis Jan 23 '19 at 18:32
1

Right you are. Edited again to have a common field. – steveo'america Jan 23 '19 at 18:33

Darren Tsai · Answer 2 · 2019-01-23T18:46:12.480

1

expand.grid()

Use expand.grid() to create all combinations of id and date. By the way, notice to transform your date to the class Date by as.Date() otherwise it will be a meaningless string.

df %>% mutate(date = as.Date(date, "%m/%d/%y")) %>%
  right_join(expand.grid(id = unique(.$id), date = unique(.$date))) %>%
  mutate(value = coalesce(value, 0L)) %>% 
  arrange(id, date)

#   id       date value
# 1  1 2012-08-06     1
# 2  1 2012-08-07     0
# 3  1 2012-08-08     1
# 4  2 2012-08-06     0
# 5  2 2012-08-07     2
# 6  2 2012-08-08     1

Reproducible Data

df <- structure(list(id = c(1L, 1L, 2L, 2L), date = c("8/06/12", "8/08/12", 
"8/07/12", "8/08/12"), value = c(1L, 1L, 2L, 1L)), class = "data.frame", row.names = c(NA, 
-4L))

edited Jan 23 '19 at 18:46

answered Jan 23 '19 at 18:05

Darren Tsai

32,117
5
21
51

1

`expand.grid` will work well on `data.frame`, but it will have to be copied into spark. I believe the magic incantation to do that is to use the `copy` parameter in a join. Something like `right_join(my_local_df,by=..., copy=TRUE)`. – steveo'america Jan 23 '19 at 19:03
Thanks for your great suggestion! I didn't notice that before seeing your comment. – Darren Tsai Jan 23 '19 at 19:10

Is there a way to fill in missing dates with 0s using dplyr?

2 Answers2