Complete column with group_by and complete

Question

I've got a little problem using dplyr group_by function. After doing this :

datasetALL %>% group_by(YEAR,Region) %>% summarise(count_number = n())

here is the result :

YEAR Region count_number
<int>  <int>        <int>
1   1946      1            2
2   1946      2            3
3   1946      3            1
4   1946      5            1
5   1947      3            1
6   1947      4            1

I would like something like :

YEAR Region count_number
<int>  <int>        <int>
1   1946      1            2
2   1946      2            3
3   1946      3            1
4   1946      5            1
5   1946      4            0 #order is not important
6   1947      1            0
7   1947      2            0
8   1947      3            1
9   1947      4            1
10  1947      5            0

I tried to use complete() from tidyr package, but it's not succeeding...

Please show us how you're using `complete`. Probably, you need to `ungroup` before you run `complete`. Also, it depends on what variables you are `nesting` within `complete`. — eipi10, Apr 19 '17 at 16:55
This previous question seems to cover it... http://stackoverflow.com/questions/22523131/dplyr-summarise-equivalent-of-drop-false-to-keep-groups-with-zero-length-in — Andrew Gustar, Apr 19 '17 at 16:56
For example, run the following code with and without `ungroup`: `mtcars %>% group_by(carb, cyl) %>% tally %>% arrange(cyl, carb) %>% ungroup %>% complete(carb, nesting(cyl), fill=list(n=0))`. — eipi10, Apr 19 '17 at 17:00

score 20 · Accepted Answer · answered Apr 19 '17 at 17:00

20

Using complete from the tidyr package should work. You can find documentation about it here.

What probably happened is that you did not remove the grouping. Then complete tries to add each of the combinations of YEAR and Region within each group. But all these combinations are already in the grouping. Thus first remove the grouping and then do the complete.

datasetALL %>% 
    group_by(YEAR,Region) %>% 
    summarise(count_number = n()) %>%
    ungroup() %>%
    complete(Year, Region, fill = list(count_number = 1))

answered Apr 19 '17 at 17:00

Pieter

3,262
1
17
27

23

`complete` documentation is awful and does not provide a simple example before launching into `complete/nesting` – Nettle Sep 01 '18 at 22:57
25

You might say it's in`complete` – user14353 Apr 04 '19 at 17:02

score 2 · Answer 2 · answered Nov 26 '19 at 22:34

It has been already mentioned, but you can solve this problem in its entirety by using tidyr and the parameter nesting in it:

complete(df, YEAR, nesting(Region), fill = list(count_number = 0))

    YEAR Region count_number
   <int>  <int>        <dbl>
 1  1946      1            2
 2  1946      2            3
 3  1946      3            1
 4  1946      4            0
 5  1946      5            1
 6  1947      1            0
 7  1947      2            0
 8  1947      3            1
 9  1947      4            1
10  1947      5            0

Complete column with group_by and complete

2 Answers2

Linked

Related