0

Given the following dataframe :

set.seed(1)

my_df = data.frame(x = rep(words[1:5], 50) %>% sort(),
                 y = 1:250,
                 z = sample(seq(from = 30 , to = 90, by = 0.1), size = 250, replace = T)) 




my_df %>% head(30)
   x  y    z
1  a  1 45.9
2  a  2 52.3
3  a  3 64.4
4  a  4 84.5
5  a  5 42.1
6  a  6 83.9
7  a  7 86.7
8  a  8 69.7
9  a  9 67.8
10 a 10 33.7
11 a 11 42.3
12 a 12 40.6
13 a 13 71.2
14 a 14 53.0
15 a 15 76.2
16 a 16 59.9
17 a 17 73.1
18 a 18 89.6
19 a 19 52.8
20 a 20 76.7
21 a 21 86.1
22 a 22 42.7
23 a 23 69.1
24 a 24 37.5
25 a 25 46.0
26 a 26 53.2
27 a 27 30.8
28 a 28 52.9
29 a 29 82.2
30 a 30 50.4

I would like to create the following column using dplyr mutate

for each value In column z show the row index of the first value in z which is lower.

For example:

for row 8 show 5

for row 22 show 12

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Rafael Zanzoori
  • 531
  • 1
  • 7
  • 23
  • 1
    But for row 8 the entry in row 3 is the closest lower value, is it not: `my_df$z[8] = 69.7` vs. `my_df$z[3] = 64.4`? Why do you expect row 5 `my_df[5] = 42.1`? – Maurits Evers Jun 25 '18 at 08:52
  • Sorry Maybe i wasn't clear enough.. i meant that for each value z[i] go backwards and show the row index of the first value which is lower. if it's clearer now i'll update my post – Rafael Zanzoori Jun 25 '18 at 09:04

2 Answers2

1

I'm not sure how to do this using dplyr, but here is a data.table attempt using a self non-equi join

library(data.table)

setDT(my_df) %>% #convert to data.table
  # Run a self non-equi join and find the closest lower value
  .[., .N - which.max(rev(z < i.z)) + 1L, on = .(y <= y), by = .EACHI] %>%
  # filter the cases where there are no such values
  .[y != V1] %>%
  # join the result back to the original data
  my_df[., on = .(y), res := V1]

head(my_df, 22)
#     x  y    z res
#  1: a  1 45.9  NA
#  2: a  2 52.3   1
#  3: a  3 64.4   2
#  4: a  4 84.5   3
#  5: a  5 42.1  NA
#  6: a  6 83.9   5
#  7: a  7 86.7   6
#  8: a  8 69.7   5
#  9: a  9 67.8   5
# 10: a 10 33.7  NA
# 11: a 11 42.3  10
# 12: a 12 40.6  10
# 13: a 13 71.2  12
# 14: a 14 53.0  12
# 15: a 15 76.2  14
# 16: a 16 59.9  14
# 17: a 17 73.1  16
# 18: a 18 89.6  17
# 19: a 19 52.8  12
# 20: a 20 76.7  19
# 21: a 21 86.1  20
# 22: a 22 42.7  12
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
0

I have managed to find a dplyr solution inspired by a solution given to one of my other questions using rollapply in this link.

set.seed(1)

my_df = data.frame(x = rep(words[1:5], 50) %>% sort(),
                   y = 1:250,
                   z = sample(seq(from = 30 , to = 90, by = 0.1), size = 250, replace = T)) 


my_df %>%
  mutate(First_Lower_z_Backwards = row_number() - rollapply(z, 
                                                            width = list((0:(-n()))), 
                                                            FUN = function(x) which(x < x[1])[1] - 1,
                                                            fill = NA,
                                                            partial = T)) %>%
  head(22)


   x  y    z First_Lower_z_Backwards
1  a  1 45.9                      NA
2  a  2 52.3                       1
3  a  3 64.4                       2
4  a  4 84.5                       3
5  a  5 42.1                      NA
6  a  6 83.9                       5
7  a  7 86.7                       6
8  a  8 69.7                       5
9  a  9 67.8                       5
10 a 10 33.7                      NA
11 a 11 42.3                      10
12 a 12 40.6                      10
13 a 13 71.2                      12
14 a 14 53.0                      12
15 a 15 76.2                      14
16 a 16 59.9                      14
17 a 17 73.1                      16
18 a 18 89.6                      17
19 a 19 52.8                      12
20 a 20 76.7                      19
21 a 21 86.1                      20
22 a 22 42.7                      12
Rafael Zanzoori
  • 531
  • 1
  • 7
  • 23