2

Question:

In a dataframe, I want to create a new column as the indices of the next smaller value of an existing column.

For example, the data looks like this. It is already arranged in item, day.

  item day val 
1    1   2   3 
2    1   4   2 
3    1   5   1 
4    2   1   1 
5    2   3   2 
6    2   5   3 

First I would like to use group_by(item) in dplyr to select the sub-dataframe of each item.

Then for row 1, I look down the rows and find that row 2 has a smaller val. This is what I want, so I record the day corresponding to that row. Similar for row 2.

Note that for row 3 and 6, they are the last rows of corresponding sub-dataframes, so there is no next smaller value. For row 4 and 5, there is no smaller val when I look down the rows.

The dataframe with the new column should look like this.

  item day val next.smaller.day
1    1   2   3                4
2    1   4   2                5
3    1   5   1               -1
4    2   1   1               -1
5    2   3   2               -1
6    2   5   3               -1

I wonder if there is any way of using dplyr to implement this, or any codes in r other than a for loop.

I found a thread asking the algorithm of this question. Given an array, find out the next smaller element for each element . It is relevant, and the proposed algorithm beats mine in terms of time complexity, but I still find it hard to implement in my scenario.

Thank you!

Update:

Here is another example to re-illustrate what I'm looking for.

  item day val next.smaller.day
1    1   2   2                5
2    1   4   3                5
3    1   5   1               -1
4    2   1   3                3
5    2   3   1               -1
6    2   5   2               -1
Community
  • 1
  • 1
Richard X.
  • 43
  • 4

1 Answers1

1

You can group your data by the item, calculate the different between rows using the diff function and check if it is smaller than zero which will then generate a logic vector and you can use the logic vector to pick up the next day. And since you are picking up the next day, you will need the lead function to shift the day column forward so that it can match the rows where you want to place them.

Side note: Since diff function create a vector one element shorter than the original one and you will always leave the last row out per group, we can pad the diff result by a FALSE condition.

library(dplyr);
df %>% group_by(item) %>% mutate(smaller = c(diff(val) < 0, F), 
                                 next.smaller.day = ifelse(smaller, lead(day), -1)) %>%
       select(-smaller)

# Source: local data frame [6 x 4]
# Groups: item [2]

#    item   day   val next.smaller.day
#   <int> <int> <int>            <dbl>
# 1     1     2     3                4
# 2     1     4     2                5
# 3     1     5     1               -1
# 4     2     1     1               -1
# 5     2     3     2               -1
# 6     2     5     3               -1

Update:

find.next.smaller <- function(ini = 1, vec) {
    if(length(vec) == 1) NA 
    else c(ini + min(which(vec[1] > vec[-1])), 
          find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
        # the index of the next smaller value.

df %>% group_by(item) %>% mutate(next.smaller.day = day[find.next.smaller(1, val)], 
                                 next.smaller.day = replace(next.smaller.day, is.na(next.smaller.day), -1)) 

# Source: local data frame [6 x 4]
# Groups: item [2]
#
#    item   day   val next.smaller.day
#   <int> <int> <dbl>            <dbl>
# 1     1     2     2                5
# 2     1     4     3                5
# 3     1     5     1               -1
# 4     2     1     1               -1
# 5     2     3     2               -1
# 6     2     5     3               -1
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thanks for your reply. What you did works in the given example. However, it is possible that the next smaller value is not in the next row. For example, if `val` is `2,3,1` and `day` is `2,4,5`, I expect the output to be `5,5,-1`. Do you have any idea how to implement this? – Richard X. Jul 06 '16 at 04:09
  • See the update. You can write a recursive function to find the index of the next smaller value, then apply it using the `dplyr` function. – Psidom Jul 06 '16 at 13:56
  • Thank you very much! I tried to write a function to achieve my goal, but I couldn't figure it out. Your solution works very well. Let me learn the spirit of the recursive nature of your function and try to apply it next time. Many thanks again! – Richard X. Jul 06 '16 at 18:45