Fill gaps using a group mean in R

Question

I have a data set which has gaps in one of the columns (temp). I am trying to fill the gaps using the "temp" data from a "sensor" or mean of "sensors" within the same "treatment", and of course same date stamp. I am trying to do this using tidyverse/lubridate.

date    treatment   sensor  temp
1/01/2019   1   A   30
2/01/2019   1   A   29.1
3/01/2019   1   A   21.2
4/01/2019   1   A   NA
1/01/2019   1   B   20.5
2/01/2019   1   B   19.8
3/01/2019   1   B   35.1
4/01/2019   1   B   23.5
1/01/2019   2   C   31.2
2/01/2019   2   C   32.1
3/01/2019   2   C   28.1
4/01/2019   2   C   31.2
1/01/2019   2   D   NA
2/01/2019   2   D   26.5
3/01/2019   2   D   27.9
4/01/2019   2   D   28

This is what I am expecting:

date    treatment   sensor  temp
1/01/2019   1   A   30
2/01/2019   1   A   29.1
3/01/2019   1   A   21.2
4/01/2019   1   A   23.5
1/01/2019   1   B   20.5
2/01/2019   1   B   19.8
3/01/2019   1   B   35.1
4/01/2019   1   B   23.5
1/01/2019   2   C   31.2
2/01/2019   2   C   32.1
3/01/2019   2   C   28.1
4/01/2019   2   C   31.2
1/01/2019   2   D   31.2
2/01/2019   2   D   26.5
3/01/2019   2   D   27.9
4/01/2019   2   D   28

Many thanks for your help.

https://stackoverflow.com/questions/34517370/group-by-into-fill-not-working-as-expected — BENY, May 14 '19 at 00:57
Did you mean to provide some values in the temp column? Are you trying to carry a value forward or backward in time or merely carry a value across from another column? — icj, May 14 '19 at 01:09
`tidyr::fill()` is quite difficult to use in this example because when the data is grouped by date and treatment, one of the fills is "down" and the other "up". — neilfws, May 14 '19 at 01:12

score 1 · Answer 1 · answered May 14 '19 at 02:31

Another option with na.aggregate from zoo

library(dplyr)
library(zoo)
df %>% 
   group_by(date, treatment) %>%
   mutate(temp = na.aggregate(temp))
# A tibble: 16 x 4
# Groups:   date, treatment [8]
#   date      treatment sensor  temp
#   <fct>         <int> <fct>  <dbl>
# 1 1/01/2019         1 A       30  
# 2 2/01/2019         1 A       29.1
# 3 3/01/2019         1 A       21.2
# 4 4/01/2019         1 A       23.5
# 5 1/01/2019         1 B       20.5
# 6 2/01/2019         1 B       19.8
# 7 3/01/2019         1 B       35.1
# 8 4/01/2019         1 B       23.5
# 9 1/01/2019         2 C       31.2
#10 2/01/2019         2 C       32.1
#11 3/01/2019         2 C       28.1
#12 4/01/2019         2 C       31.2
#13 1/01/2019         2 D       31.2
#14 2/01/2019         2 D       26.5
#15 3/01/2019         2 D       27.9
#16 4/01/2019         2 D       28

data

df <- structure(list(date = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/01/2019", 
"2/01/2019", "3/01/2019", "4/01/2019"), class = "factor"), treatment = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    sensor = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
    3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"
    ), class = "factor"), temp = c(30, 29.1, 21.2, NA, 20.5, 
    19.8, 35.1, 23.5, 31.2, 32.1, 28.1, 31.2, NA, 26.5, 27.9, 
    28)), class = "data.frame", row.names = c(NA, -16L))

Simon Woodward · Accepted Answer · 2019-05-14T03:02:58.547

0

How about this:

df <- df %>%
group_by(date, treatment) %>%
mutate(
  fill = mean(temp, na.rm=TRUE), # value to fill in blanks
  temp2 = case_when(!is.na(temp) ~ temp,
                    TRUE ~ fill)
  )

edited May 14 '19 at 03:02

answered May 14 '19 at 01:22

Simon Woodward

1,946
1
16
24

1

This worked nicely for what I was trying to achieve. Many thanks – hizjamali May 14 '19 at 03:23
You could also use `if_else` in this case, which is more compact, but I tend to use `case_when` most of the time because it's more flexible. – Simon Woodward May 14 '19 at 18:10
1

I also prefer using case_when – hizjamali May 15 '19 at 01:43

score 0 · Answer 3 · answered May 14 '19 at 02:03

Here is one option using map2_dbl from purrr. We group_by treatment and replace NA temp with the first non-NA temp with the same date in the group.

library(dplyr)
library(purrr)

df %>%
  group_by(treatment) %>%
  mutate(temp = map2_dbl(temp, date, ~if (is.na(.x)) 
                    temp[which.max(date == .y & !is.na(temp))] else .x))

#   date      treatment sensor  temp
#   <fct>         <int> <fct>  <dbl>
# 1 1/01/2019         1 A       30  
# 2 2/01/2019         1 A       29.1
# 3 3/01/2019         1 A       21.2
# 4 4/01/2019         1 A       23.5
# 5 1/01/2019         1 B       20.5
# 6 2/01/2019         1 B       19.8
# 7 3/01/2019         1 B       35.1
# 8 4/01/2019         1 B       23.5
# 9 1/01/2019         2 C       31.2
#10 2/01/2019         2 C       32.1
#11 3/01/2019         2 C       28.1
#12 4/01/2019         2 C       31.2
#13 1/01/2019         2 D       31.2
#14 2/01/2019         2 D       26.5
#15 3/01/2019         2 D       27.9
#16 4/01/2019         2 D       28

data

df <- structure(list(date = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1/01/2019", 
"2/01/2019", "3/01/2019", "4/01/2019"), class = "factor"), treatment = 
c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
sensor = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("A", "B", "C", "D"
), class = "factor"), temp = c(30, 29.1, 21.2, NA, 20.5, 
19.8, 35.1, 23.5, 31.2, 32.1, 28.1, 31.2, NA, 26.5, 27.9, 
28)), class = "data.frame", row.names = c(NA, -16L))

Fill gaps using a group mean in R

3 Answers3

data