1

I have a dataframe that looks like this:

DF<-data.frame(ID=c(seq(1,10, by = 1)),
               C1=c(9.1,0,9,9.2,9.1,0,0,9.3,9.1,9.0) ,
               grp = c(1,2,2,3,4,5,5,5,6,7)
               ) %>% 
  group_by(grp) %>% 
  mutate (
    C1m = mean(C1),
  )

The C1m values in groups 2 and 5 are unreliable because at least one C1 value is equal to 0. I am trying to get the variable C2 which for group 2 and 5 will be from the preceding group (1 and 4 respectively).

This is the output:

      ID    C1   grp   C1m    C2
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1   9.1     1   9.1   9.1
 2     2   0       2   4.5   9.1
 3     3   9       2   4.5   9.1
 4     4   9.2     3   9.2   9.2
 5     5   9.1     4   9.1   9.1
 6     6   0       5   3.1   9.1
 7     7   0       5   3.1   9.1
 8     8   9.3     5   3.1   9.1
 9     9   9.1     6   9.1   9.1
10    10   9       7   9     9 

Does anyone know an efficient way to solve this?

Thank you!

GrBa
  • 381
  • 1
  • 9
  • 1
    *"from the previous group to the next"*, I think you mean from group 1 to 2, from group 2 to 3, etc, right? Where does group 2's average of 4.5 go? – r2evans Jul 12 '23 at 17:27
  • Sorry for the imprecise question. I already corrected. Values 4.5 and 3.1 are unreliable and should be removed – GrBa Jul 12 '23 at 17:30

2 Answers2

1

Please try the below code

DF<-data.frame(ID=c(seq(1,10, by = 1)),
               C1=c(9.1,0,9,9.2,9.1,0,0,9.3,9.1,9.0) ,
               grp = c(1,2,2,3,4,5,5,5,6,7)
) %>% 
  mutate(C2x=ifelse(C1==0,NA_real_,C1)) %>% fill(C2x) %>% group_by(grp) %>% 
  mutate (
    C1m = round(mean(C1),digits = 1),
    C2 = round(mean(C2x),digits = 1),
  ) %>% select(-C2x)

Created on 2023-07-12 with reprex v2.0.2

# A tibble: 10 × 5
# Groups:   grp [7]
      ID    C1   grp   C1m    C2
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1   9.1     1   9.1   9.1
 2     2   0       2   4.5   9.1
 3     3   9       2   4.5   9.1
 4     4   9.2     3   9.2   9.2
 5     5   9.1     4   9.1   9.1
 6     6   0       5   3.1   9.2
 7     7   0       5   3.1   9.2
 8     8   9.3     5   3.1   9.2
 9     9   9.1     6   9.1   9.1
10    10   9       7   9     9  

jkatam
  • 2,691
  • 1
  • 4
  • 12
1

An alternate dplyr implementation:

ungroup(DF) %>%
  mutate(prev = lag(C1m)) %>%
  group_by(grp) %>%
  mutate(C2 = if (any(abs(C1) < 1e-9)) prev[1] else C1) %>%
  ungroup()
# # A tibble: 10 × 6
#       ID    C1   grp   C1m  prev    C2
#    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#  1     1   9.1     1   9.1  NA     9.1
#  2     2   0       2   4.5   9.1   9.1
#  3     3   9       2   4.5   4.5   9.1
#  4     4   9.2     3   9.2   4.5   9.2
#  5     5   9.1     4   9.1   9.2   9.1
#  6     6   0       5   3.1   9.1   9.1
#  7     7   0       5   3.1   3.1   9.1
#  8     8   9.3     5   3.1   3.1   9.1
#  9     9   9.1     6   9.1   3.1   9.1
# 10    10   9       7   9     9.1   9  

(and then %>% select(-prev) to clean up)

I use abs(C1) < 1e-9 instead of C1 == 0 due to Why are these numbers not equal? (aka R FAQ 7.31). Briefly, with floating-point numbers and how they are stored internally, it is not always guaranteed that strict equality can be met. It works fine in this case but I tend to recommend a slightly more defensive posture when dealing with non-integer numbers and tests like this.

Because the above relies 100% on the order of rows, here's an alternative that works when row-order is not guaranteed:

distinct(DF, grp, C1m) %>%
  mutate(grp = grp + 1) %>%
  left_join(DF, ., by = "grp", suffix = c("", ".y")) %>%
  mutate(C2 = if (any(abs(C1) < 1e-9)) C1m.y else C1m) %>%
  ungroup() %>%
  select(-C1m.y)
# # A tibble: 10 × 5
#       ID    C1   grp   C1m    C2
#    <dbl> <dbl> <dbl> <dbl> <dbl>
#  1     1   9.1     1   9.1   9.1
#  2     2   0       2   4.5   9.1
#  3     3   9       2   4.5   9.1
#  4     4   9.2     3   9.2   9.2
#  5     5   9.1     4   9.1   9.1
#  6     6   0       5   3.1   9.1
#  7     7   0       5   3.1   9.1
#  8     8   9.3     5   3.1   9.1
#  9     9   9.1     6   9.1   9.1
# 10    10   9       7   9     9  
r2evans
  • 141,215
  • 6
  • 77
  • 149