R: Subsetting with two variables

Question

This is a follow-up to: R: Subsetting on increasing value to max excluding the decreasing

The posted solution works and now I would like to add a low cutoff based on a second variable. Thus far I'm not certain about how to approach this with data.table. As an example, I would like to restrict output to max of B and all values after the first instance of D == 1 by TrialNum. I assume this means extracting and using the index (using which?) associated with the low cutoff of D.

TrialNum,Obs ,A,B,C,D
1,1,23,1,23,1
1,2,21,2,21,2
1,3,14,3,14,1
1,4,34,4,34,3
1,5,32,5,32,2
1,6,21,3,21,1
1,7,16,5,16,3
1,8,18,2,18,1 
2,1,26,1,26,1
2,2,11,2,11,2
2,3,23,3,23,1
2,4,12,4,12,1
2,5,3,2,3,1
2,6,4,3,4,3
2,7,22,1,22,1
2,8,15,2,15,1

Expected output,

TrialNum,Obs,A,B,C,D
1,2,21,2,21,2
1,3,14,3,14,1
1,4,34,4,34,3
1,5,32,5,32,2
2,2,11,2,11,2
2,3,23,3,23,1
2,4,12,4,12,1

So, it's just the first instance of the low cutoff. I don't which to lose data where D drops below threshold after identifying the starting point. Like the solution posted yesterday, I've tried variations of using which in the expression to capture both max(B) and the low cutoff associated with D.

A data.table solution is preferable because it seems currently data.table and dplyr are incompatible on Windows R3.2.0.

Can you clarify what exactly you are after? Is this correct?: "I want all rows from (but not including) the first occurence of D==1, to (and including) the row with the maximum value of B, for each TrialNum"? — mathematical.coffee, Jul 24 '15 at 02:18

score 0 · Accepted Answer · answered Jul 24 '15 at 02:25

To solve your problem, think about how to find the row numbers you are after.

Assume for the moment our dataframe has just one TrialNum in it. In your previous question, you learned that to find the row with the maximum value of B, you can use which.max(B).

Now you want to find the row where D is 1, so you can use which(D==1). Now, if multiple rows equal 1, which will return multiple indices (see ?which), so you can use [1] to get just the index of the first occurence. Since you don't want to include the D==1 row itself, add 1 to the index: which(D==1)[1] + 1.

When you have these two numbers, you just want all rows in between, i.e. (which(D==1)[1] + 1):which.max(B).

Then combine with by=TrialNum to ensure that your dataframe only has one TrialNum in it:

x[, .SD[(which(D==1)[1] + 1):which.max(B)], by=TrialNum]

(Note - what will you do if there is no row where D==1? You will have to think about how to handle that).

Thanks. I had tried one similar variation then got distracted with using a logical operator to join the index associated with D and max(B). (Edit) In reality, I'm going to use **>=** some minimum value so the original question may have been ill-posed saying D==1. — ksing, Jul 24 '15 at 02:38

R: Subsetting with two variables

1 Answers1