First highest value in each frame in a list and add into a new column

Question

I'm trying to find the first time 'price' is higher than another value called 'dayhigh' per day.

I've had issues with getting this result into a time series object so I'm just using the POSIXlt class for datetime and the day for reference is in Date class. Example data is in a frame called 'example':

day,datetime,price,dayhigh
2016-09-01,2016-09-01 15:00:00,1.11912,1.11990
2016-09-01,2016-09-01 15:00:00,1.13000,1.11990
2016-09-01,2016-09-01 15:00:01,1.11911,1.11990
2016-09-05,2016-09-05 15:00:00,1.11436,1.11823
2016-09-05,2016-09-05 15:00:01,1.11436,1.11823
2016-09-05,2016-09-05 15:00:01,1.11900,1.11823
2016-09-05,2016-09-05 15:00:01,1.11436,1.11823
2016-09-06,2016-09-06 15:00:00,1.12383,1.12557
2016-09-06,2016-09-06 15:00:00,1.12382,1.12557
2016-09-06,2016-09-06 15:00:00,1.12382,1.12557
2016-09-06,2016-09-06 15:00:00,1.12384,1.12557
2016-09-06,2016-09-06 15:00:00,1.12384,1.12557
2016-09-06,2016-09-06 15:00:00,1.12558,1.12557
2016-09-06,2016-09-06 15:00:01,1.12559,1.12557

df = data.frame(
  day = c("2016-09-01", "2016-09-01", "2016-09-01", "2016-09-05", "2016-09-05", 
          "2016-09-05", "2016-09-05", "2016-09-06", "2016-09-06", "2016-09-06", 
          "2016-09-06", "2016-09-06", "2016-09-06", "2016-09-06"),
  datetime = c("2016-09-01 15:00:00", "2016-09-01 15:00:00", "2016-09-01 15:00:01", 
               "2016-09-05 15:00:00", "2016-09-05 15:00:01", "2016-09-05 15:00:01",
               "2016-09-05 15:00:01", "2016-09-06 15:00:00", "2016-09-06 15:00:00", 
               "2016-09-06 15:00:00", "2016-09-06 15:00:00", "2016-09-06 15:00:00", 
               "2016-09-06 15:00:00", "2016-09-06 15:00:01"),
  price = c(1.11912, 1.13, 1.11911, 1.11436, 1.11436, 1.119, 1.11436,
            1.12383, 1.12382, 1.12382, 1.12384, 1.12384, 1.12558, 1.12559), 
  dayhigh = c(1.1199, 1.1199, 1.1199, 1.11823, 1.11823, 1.11823, 1.11823,
              1.12557, 1.12557, 1.12557, 1.12557, 1.12557, 1.12557, 1.12557)
)

One idea I've had is to split the frame by day into a list of frames:

exlist <- split(example, as.Date(example$day))

This returns a list of objects.

What I'd like to do is to use which.max on each frame object within the list and add a 'TRUE' into a new column in each frame for the row where the day's first high happens. The day's first high defined as the first price > dayhigh for each day.

From there I can concatenate back into a single frame and perform further analysis.

@Frank: the tags `dataframe, data.table, max, which` were all relevant, please stop reverting them. — smci, May 01 '17 at 01:22

Mike H. · Answer 1 · 2017-05-01T00:09:20.857

3

No need to go through all that work, you could use data.table to do it in one step:

library(data.table)
setDT(df)
df[ , first_high := (seq_len(.N) == which(price > dayhigh)[1]), by = day ]

df
#           day            datetime   price dayhigh first_high
# 1: 2016-09-01 2016-09-01 15:00:00 1.11912 1.11990      FALSE
# 2: 2016-09-01 2016-09-01 15:00:00 1.13000 1.11990       TRUE
# 3: 2016-09-01 2016-09-01 15:00:01 1.11911 1.11990      FALSE
# 4: 2016-09-05 2016-09-05 15:00:00 1.11436 1.11823      FALSE
# 5: 2016-09-05 2016-09-05 15:00:01 1.11436 1.11823      FALSE
# 6: 2016-09-05 2016-09-05 15:00:01 1.11900 1.11823       TRUE
# 7: 2016-09-05 2016-09-05 15:00:01 1.11436 1.11823      FALSE
# 8: 2016-09-06 2016-09-06 15:00:00 1.12383 1.12557      FALSE
# 9: 2016-09-06 2016-09-06 15:00:00 1.12382 1.12557      FALSE
#10: 2016-09-06 2016-09-06 15:00:00 1.12382 1.12557      FALSE
#11: 2016-09-06 2016-09-06 15:00:00 1.12384 1.12557      FALSE
#12: 2016-09-06 2016-09-06 15:00:00 1.12384 1.12557      FALSE
#13: 2016-09-06 2016-09-06 15:00:00 1.12558 1.12557       TRUE
#14: 2016-09-06 2016-09-06 15:00:01 1.12559 1.12557      FALSE

edited May 01 '17 at 00:09

answered Apr 30 '17 at 23:45

Mike H.

13,960
2
29
39

1

better is `seq_len(.N) == which.max(price > dayhigh)` – MichaelChirico Apr 30 '17 at 23:57
Thanks @MichaelChirico! Updated. `seq_len` will be faster, but for my own curiosity are there any other benefits as well? – Mike H. May 01 '17 at 00:01
More important is [avoiding `ifelse`](http://stackoverflow.com/questions/16275149/does-ifelse-really-calculate-both-of-its-vectors-every-time-is-it-slow) – MichaelChirico May 01 '17 at 00:07
Ahh, I see. Thanks for the tip! – Mike H. May 01 '17 at 00:09
happy to help!! – MichaelChirico May 01 '17 at 00:10
Worked first time. Cheers! – nycrefugee May 01 '17 at 00:26
1

@Mike H's very neat answer adapted for dplyr `df %>% group_by(day) %>% mutate (first_high = (seq_along(price) == which(price > dayhigh)[1]))` – Andrew Lavers May 01 '17 at 00:52
1

To decode the cleverness for the rest of us: `which/which.max()` returns a variable-length vector of indices, then `seq_len(.N) == which(...)` converts that back into a fixed-length vector of booleans. – smci May 01 '17 at 01:03

score 3 · Accepted Answer · answered May 01 '17 at 00:30

3

I have another solution based on data.table.

library(data.table)
setDT(example)
example[, first.high:= (.I == .I[which.max(price>dayhigh)]), by=day ]

answered May 01 '17 at 00:30

Yannis Vassiliadis

1,719
8
14

score 1 · Answer 3 · answered May 01 '17 at 00:00

You could use ave twice.

#1) To make sure price is greater than dayhigh

#2) To make sure it's happening for the first time for a given sub-group.

ave(1:NROW(df), df$day, FUN = function(i) df$price[i] > df$dayhigh[i]) & #1
    ave(1:NROW(df), df$day, FUN = function(i) cumsum(df$price[i] > df$dayhigh[i]) == 1) #2
#[1] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

First highest value in each frame in a list and add into a new column

3 Answers3