Extract rows for first occurrence of a variable prior to the occurrence of an event

Question

Trying to extract the first occurrence of a variable in a data frame PRIOR to a specific value already selected in a data frame. Specifically, the output of head(df) is:

date discharge     event event.isolation some.column
1/1/2016  7.782711         NA  NA             FALSE
1/2/2016  7.349389  -5.567748  none            TRUE
1/3/2016  7.053813  -4.021769  none            TRUE
1/4/2016  7.421568   5.213554  none            TRUE
1/5/2016  5.722443 -22.894418  none            TRUE
1/6/2016  5.497342  -3.933662  none            TRUE
1/7/2016  5.347890  -6.898281  none            TRUE
1/8/2016  7.983489   4.289382  none            TRUE
1/9/2016  8.488293  -19.28304  none            TRUE

I'd like to find the date of the first discharge value of 7.7 or greater before each event of -22 or less. In other words, I know each event of interest; I would like to iteratively search backwards to find the first discharge value of 7.7 or greater prior to each selected event.

I'm basically trying to combine Extract rows for the first occurrence of a variable in a data frame with Select row prior to first occurrence of an event by group, but am having difficulty so.

The desired result would be df[1, ] as it contains the first discharge value (working backwards) that exceeds 7.7, prior to the event in row 5 that I've selected.

if you show your expected output answers will come easier and in better quality — s_baldur, Sep 03 '18 at 14:27
1. post result of `dput(head(df))` 2. show your **desired result** like the example data — Andre Elrico, Sep 03 '18 at 14:27
Your `df` seems to be missing the header of one of the columns. Also, could you provide a few extra lines? — P1storius, Sep 03 '18 at 15:20
@BrynnO'donnell a few extra lines where event < -22 would help, because solutions will usually work better if they come for as representative examples as possible — P1storius, Sep 03 '18 at 15:35

score 0 · Answer 1 · answered Sep 03 '18 at 15:33

This is not the most elegant solution, but it works for the example.

This first defines intervals to looks (one interval for each event < -22). Then looks for the first occurrence of discharge > 7.7

I am assuming in this example that you do not want to find rows where event < -22 AND discharge > 7.7, even if that would be the first occurence of discharge > 7.7 since the last event

df <- read.csv(text = 'date discharge     event event.isolation some.column
1 1/1/2016  7.782711         NA  <NA>           FALSE
 2 1/2/2016  7.349389  -5.567748  none            TRUE
 3 1/3/2016  7.053813  -4.021769  none            TRUE
 4 1/4/2016  7.421568   5.213554  none            TRUE
 5 1/5/2016  5.722443 -22.894418  none            TRUE
 6 1/6/2016  5.497342  -3.933662  none            TRUE
 7 1/7/2016  5.347890  -6.898281  none            TRUE
 8 1/8/2016  7.983489   4.289382  none            TRUE',sep="")

## look which rows have a value for event < 22 and also include row 0 to define the first interval to look
 d <- c(0,which(df$event < -22))

## Each interval is defined as d[i] to d[i+1], where intervals are skipped where these are equal (because then you would return rows where both event < -22 and discharge > 7.7
new.df <- NULL
 for(i in 1:(length(d)-1)) {
  if(d[i+1] > (d[i] + 1)) {
   ## this will look only in the interval and return the first row for which the condition discharge>7.7 is TRUE
   new.df <- subset(df[(d[i]+1):(d[i+1]-1),], discharge>7.7)[1,]
  }
}

Extract rows for first occurrence of a variable prior to the occurrence of an event

1 Answers1