If I understand correctly, this can be solved by a non-equi self join using two helper columns:
library(data.table)
setDT(df)[, rn := .I][, threshold := 0.8 * sample][
, conc20 := df[df, on = .(rn > rn, sample < threshold), mult = "first", x.date]][
, c("rn", "threshold") := NULL][]
date sample conc20
1: 2020-02-14 0.008470 2020-02-20
2: 2020-02-15 0.008460 2020-02-20
3: 2020-02-16 0.007681 2020-02-27
4: 2020-02-17 0.007144 2020-02-27
5: 2020-02-18 0.007262 2020-02-27
6: 2020-02-19 0.007300 2020-02-27
7: 2020-02-20 0.006604 <NA>
8: 2020-02-21 0.006843 2020-02-27
9: 2020-02-22 0.006687 2020-02-27
10: 2020-02-23 0.006991 2020-02-27
11: 2020-02-24 0.007333 2020-02-27
12: 2020-02-25 0.006738 2020-02-27
13: 2020-02-26 0.006279 <NA>
14: 2020-02-27 0.005300 <NA>
Explanation
The first condition in the on =
clause ensures that only succeeding rows are considered, the second condition looks for sample < threshold
where threshold
has been defined beforehand as 80% of sample
. The helper column rn
contains row numbers (created via the data.table special symbol .I
). In addition, mult = "first"
tells to pick the first occurrence in case of multiple matches.
The result is appended as additional column conc20
by reference, i.e., without copying the whole dataset. Finally, the two helper columns are removed by reference.
Note that data.table chaining is used.
For demonstration, the result of the non-equi self join including all helper columns can be shown:
setDT(df)[, rn := .I][, threshold := 0.8 * sample][
df, on = .(rn > rn, sample < threshold), mult = "first"]
date sample rn threshold i.date i.sample
1: 2020-02-20 0.0067760 1 0.0052832 2020-02-14 0.008470
2: 2020-02-20 0.0067680 2 0.0052832 2020-02-15 0.008460
3: 2020-02-27 0.0061448 3 0.0042400 2020-02-16 0.007681
4: 2020-02-27 0.0057152 4 0.0042400 2020-02-17 0.007144
5: 2020-02-27 0.0058096 5 0.0042400 2020-02-18 0.007262
6: 2020-02-27 0.0058400 6 0.0042400 2020-02-19 0.007300
7: <NA> 0.0052832 7 NA 2020-02-20 0.006604
8: 2020-02-27 0.0054744 8 0.0042400 2020-02-21 0.006843
9: 2020-02-27 0.0053496 9 0.0042400 2020-02-22 0.006687
10: 2020-02-27 0.0055928 10 0.0042400 2020-02-23 0.006991
11: 2020-02-27 0.0058664 11 0.0042400 2020-02-24 0.007333
12: 2020-02-27 0.0053904 12 0.0042400 2020-02-25 0.006738
13: <NA> 0.0050232 13 NA 2020-02-26 0.006279
14: <NA> 0.0042400 14 NA 2020-02-27 0.005300
Data
library(data.table)
df <- fread("
i date sample
591 2020-02-14 0.008470
590 2020-02-15 0.008460
589 2020-02-16 0.007681
588 2020-02-17 0.007144
587 2020-02-18 0.007262
586 2020-02-19 0.007300
585 2020-02-20 0.006604
584 2020-02-21 0.006843
583 2020-02-22 0.006687
582 2020-02-23 0.006991
581 2020-02-24 0.007333
580 2020-02-25 0.006738
579 2020-02-26 0.006279
580 2020-02-27 0.005300
", drop = 1L)