2

I have a dataframe that summarizes the number of times birds were observed at their breeding site one each day and each hour during daytime (i.e., when the sun was above the horizon). example:

head(df)
    ID   site day  hr  nObs
1    19  A    202  11  60
2    19  A    202  13  18
3    19  A    202  15  27
4     8  B    188  8   6
5     8  B    188  9   6
6     8  B    188  11  7

However, this dataframe does not include hours when the bird was not observed. Eg. no line for bird 19 on day 202 at 14 with an nObs value of 0.

I'd like to find a way, preferably with dplyr (tidy verse), to add in those rows for when individuals were not observed.

tnt
  • 1,149
  • 14
  • 24
  • Related: [Filling missing dates in a grouped time series - a tidyverse-way?](https://stackoverflow.com/questions/46130246/filling-missing-dates-in-a-grouped-time-series-a-tidyverse-way); [Fastest way to add rows for missing values in a data.frame?](https://stackoverflow.com/questions/10438969/fastest-way-to-add-rows-for-missing-values-in-a-data-frame) – Henrik Dec 17 '18 at 15:27

2 Answers2

3

You can use complete from tidyr, i.e.

library(tidyverse)

df %>% 
 group_by(ID, site) %>% 
 complete(hr = seq(min(hr), max(hr)))

which gives,

# A tibble: 9 x 5
# Groups:   ID, site [2]
     ID site     hr   day  nObs
  <int> <fct> <int> <int> <int>
1     8 B         8   188     6
2     8 B         9   188     6
3     8 B        10    NA    NA
4     8 B        11   188     7
5    19 A        11   202    60
6    19 A        12    NA    NA
7    19 A        13   202    18
8    19 A        14    NA    NA
9    19 A        15   202    27
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • `df %>% group_by(ID, site, day) %>% complete(hr = seq(min(hr), max(hr)), fill = list(nObs = 0))` may correspond more to the potential expected output (though of course we have no means of knowing that). – arg0naut91 Dec 17 '18 at 15:27
  • Thanks @arg0naut, your addition works perfectly to replace all the NA's with 0's. – tnt Dec 17 '18 at 15:46
1

One way to do this would be to first build a "template" of all possible combinations where birds can be observed and then merge ("left join") the actual observations onto that template:

a = read.table(text = "    ID   site day  hr  nObs
1    19  A    202  11  60
2    19  A    202  13  18
3    19  A    202  15  27
4     8  B    188  8   6
5     8  B    188  9   6
6     8  B    188  11  7")


tpl <- expand.grid(c(unique(a[, 1:3]), list(hr = 1:24)))
merge(tpl, a, all.x = TRUE)

Edit based on comment by @user3220999: in case we want to do the process per ID, we can just use split to get a list of data.frames per ID, get a list of templates and mapply merge on the two lists:

a <- split(a, a$ID)
tpl <- lapply(a, function(ai) {
  expand.grid(c(unique(ai[, 1:3]), list(hr = 1:24))) 
})
res <- mapply(merge, tpl, a, SIMPLIFY = FALSE, MoreArgs = list(all.x = TRUE)) 
Jozef
  • 2,617
  • 14
  • 19
  • I did think about this, but our sampling was very variable across the 3 month season, so it would probably have to be customized for each individual. – tnt Dec 17 '18 at 15:39
  • I edited my answer to be applicable per ID, we could of course do a similar approach with different splits. – Jozef Dec 17 '18 at 15:49