filter rows based on a value from a df +/- another value in R

Question

In this df wb I calculate the mean of T based on 2 conditions C2== "B" & C3== "AS1". Then I want to filter my data based on the calculated TmeanAS1 plus minus 1. I will then do the same for calculating the TmeanAS1 of C2== "B" & C3== "AS2" and I need to end up with a wb than has only rows with a T value in AS1 which is equal to the TmeanAS1 +/- 1 and a T value in AS21 which is equal to the TmeanAS2 +/- 1 etc.

# A tibble: 30 x 4
      C1 C2    C3        T
   <dbl> <chr> <chr> <dbl>
 1     1 A     AS1    61.5
 2     2 A     AS1    61.6
 3     3 A     AS1    61.9
 4     4 B     AS1    70.9
 5     5 B     AS1    70.9
 6     6 B     AS1    70.9
 7     7 B     AS1    70.7
 8     8 C     AS1    70.9
 9     9 C     AS1    70.9
10    10 C     AS1    70.9
# … with 20 more rows

structure(list(C1 = c(1, 2, 3, 4, 5, 6), C2 = c("A", "A", "A", 
"B", "B", "B"), C3 = c("AS1", "AS1", "AS1", "AS1", "AS1", "AS1"
), T = c(61.5034980773926, 61.6354866027832, 61.8994636535645, 
70.8747406005859, 70.8747406005859, 70.8747406005859)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

My code returns a df with the right Tmean, but the +/- doesn't work. I could also mention that the TmeanAS1 doesn't need to be a df

TmeanAS1 <- wb %>% filter(C2 == "B" & C3 == "AS1") %>% summarise(TmeanAS1=mean(T))

>TmeanAS1
  TmeanAS1
1 70.84174

wb_filtered <- wb %>% filter(T<(TmeanAS1$TmeanAS1 %+-% 1))

Where is `%+-%` defined? Besides `dplyr`, what other non-base packages are you using? What does your data look like? What is your expected output given the input data (that we don't know). — r2evans, Feb 18 '21 at 14:31
My first guess is that either `\`%+-%\`` doesn't exist, or if it does, then it is looking for vectors of equal size. It is almost certain that `Tmean` and `wb` have different numbers of rows, so something like `1:10 < (2:4 %+-% 1)` is way too confusing. — r2evans, Feb 18 '21 at 14:33
I think you can likely adapt to using `data.table::between` (since `dplyr::between` doesn't do vectors in the left/right values ... a significant mistake, imo). But you'll need to resolve how each row of `wb` aligns with each `Tmean$Tmean`. — r2evans, Feb 18 '21 at 14:34
I'll wrap this up by first saying ... welcome to SO, catinstack! Questions on SO do much better when they are reproducible and self-contained, including sample *unambiguous* data (R console display can be ambiguous or difficult for us to parse), code attempted (you have that here), literal text from warnings/errors you receive, and the expected output given the sample data. No pictures please, we generally don't do transcription. See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Thanks! — r2evans, Feb 18 '21 at 14:37
I am quite new as you have understood. Indeed the two df don't have the same nr of rows. Could you suggest another easy way to solve this? Tmean doesn't have to be a df. It is only a mean calculated, therefore the df contains only 1 value — jkinstack, Feb 18 '21 at 14:44
Please read my fourth comment, it is not needless banter. Namely, without sample data and expected output, I will not (and perhaps most will not) take much effort to guess what is going on. Even if somebody does, it might not be appropriate for your data. Please make it easier for us to help you by providing sample data. — r2evans, Feb 18 '21 at 14:46
Starting point: [edit] your question and add the output from `dput(head(wb))`, then for at least one or two rows, say what you expect to happen. — r2evans, Feb 18 '21 at 15:18
@r2evans I tried to rephrase completely and provide all possible info. Hope it helps — jkinstack, Feb 18 '21 at 16:35

r2evans · Accepted Answer · 2021-02-18T18:43:52.110

0

Perhaps this:

(TmeanAS1 <- mean(filter(wb, C2 == "B" & C3 == "AS1")$T))
# [1] 70.87474
wb %>%
  filter(between(T, TmeanAS1 - 1, TmeanAS1 + 1))
# # A tibble: 3 x 4
#      C1 C2    C3        T
#   <dbl> <chr> <chr> <dbl>
# 1     4 B     AS1    70.9
# 2     5 B     AS1    70.9
# 3     6 B     AS1    70.9

And while I don't recommend this as a practical function to keep around, let me demonstrate the creation and use of R's inline operators, denoted by a %-sandwich:

`%+-1%` <- function(a, b) (a >= b-1) & (a <= b+1)
wb %>%
  filter(T %+-1% TmeanAS1)
# # A tibble: 3 x 4
#      C1 C2    C3        T
#   <dbl> <chr> <chr> <dbl>
# 1     4 B     AS1    70.9
# 2     5 B     AS1    70.9
# 3     6 B     AS1    70.9

The backticks in the operator (function) definition are required, because otherwise R sees the % as an operator by itself (and not a legal character within an object name).

But lastly, we don't need any "between" logic, in fact:

filter(wb, abs(T - TmeanAS1) <= 1)

edited Feb 18 '21 at 18:43

answered Feb 18 '21 at 17:05

r2evans

141,215
6
77
149

is there a smart way to filter all at the same time, as I have 12x different AS in C3, and I will need at the end to keep all rows e.g. filter(T %+-1% TmeanAS1), filter(T %+-1% TmeanAS2), filter(T %+-1% TmeanAS3) etc. I was thinking for `rbind` – jkinstack Feb 19 '21 at 14:18
Your question and sample data don't go into that at all, and while I might have a hint at what you're doing, representative data and expected output given that data are much clearer. – r2evans Feb 19 '21 at 15:23
I see - I guess it has to be another post? I can see I describe it, but it's not obvious from the data. Regarding the last answer, I realised that the `wb %>% filter(between(T, TmeanAS1 - 1, TmeanAS1 + 1))` doesn't really solve my problem, as it filters all the T with the TmeanAS1. Can one add a condition to the `C3 == "AS1`? – jkinstack Feb 19 '21 at 16:07
I don't know how we are supposed to infer anything differently, when you create `Tmean` as a single value. I know it's difficult to generate a representative question that meets all of your needs, but yes, I think a new question. I'm inferring you should be including `group_by` in generating `Tmean`, suggesting a range-based join (which means `fuzzyjoin` or `data.table`, likely). – r2evans Feb 19 '21 at 16:18

score 0 · Answer 2 · answered Feb 18 '21 at 17:05

0

T is a really bad name for a column beacause in R that's logical TRUE, so I've renamed your T to be "Tee".

df %>% 
  group_by(C2, C3) %>% 
  mutate(Tmean = mean(Tee)) %>% 
  filter(Tee <= Tmean + 1 | Tee >= Tmean - 1)

You can just group by C2 and C3 both at once and then calculate the Tmean for the whole lot and filter everything.

answered Feb 18 '21 at 17:05

gladys_c_hugh

158
1
9

I agree fully ... naming variables after base R primitives such as `T` is bad practice; while R will *usually* (not always) know which you mean, it can be rather difficult to separate them cognitively when debugging code. (But, another recommendation ... always use `TRUE`, never use R's abbreviated `T`/`F`. :-) – r2evans Feb 18 '21 at 17:07
thanks for the answer. But, this doesn't take into account the condition for the C2 and C3, but it calculates the mean of all values of T – jkinstack Feb 18 '21 at 18:31
You can then just filter on the conditions if you want to by piping the above into filter(C2 == “B” & C3 == “AS1”), or whatever combinations of C2 and C3 you want. – gladys_c_hugh Feb 19 '21 at 20:03
Personally, I think my answer is better than the one accepted, because you can write the data to the environment, and then update the filter as required. You did ask for "...T value in AS21 which is equal to the TmeanAS2 +/- 1 etc.", which my answer supplies easily, whereas the accepted answer requires you to update the filters for C2 and C3, change the stored value name, then update the references to the stored values in the next filter. – gladys_c_hugh Feb 22 '21 at 19:02

filter rows based on a value from a df +/- another value in R

2 Answers2