I have a dataframe with a column for distance and count, and a column with “T” for cases where distance >1 and “F” for cases where distance is <1. I would like to add a column that gives a unique number (starting at 1) for all rows where distance is >1 each time that there is a seperate block of distances >1 (in between periods when distance drops <1). For example, the first time that distance >1 then the value in my desired result column would be 1, until distance drops <1, when “NA” would be returned instead, then when distance increases >1 again the desired value would be 2 and so on! Please see the below example and desired output.
> dput(df)
structure(list(Count = c(45L, 76L, 23L, 2L, 67L, 34L, 103L, 34L,
88L, 45L, 76L, 23L, 11L, 56L, 22L, 23L, 45L, 66L, 7L, 8L, 2L,
34L, 31L, 45L, 77L, 2L, 34L, 11L, 10L, 2L, 45L, 67L, 56L, 67L,
44L, 54L, 23L, 34L, 22L, 12L, 34L, 22L, 1L, 12L, 12L), Distance = c(0.0066,
0.417, 0.894, 1.289, 1.7645, 2.177, 3.7475, 4.2306, 4.7326, 5.2144,
6.5687, 6.9534, 7.0745, 7.1011, 6.543, 5.0043, 5.0001, 4.7842,
3.22, 2.0045, 1.765, 0.2875, 1e-04, 0.0069, 0.0756, 0.7666, 1.0898,
2.2927, 2.9022, 3.8636, 4.0495, 3.318, 2.1792, 1.7837, 0.7964,
0.0062, 0.0066, 1.567, 2.8854, 3.5679, 2.5784, 1.1111, 0.7853,
0.2, 5e-04), OnTrip = c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE,
TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE)), .Names = c("Count",
"Distance", "OnTrip"), class = "data.frame", row.names = c(NA,
-45L))
Desired output:
Count Distance OnTrip DesiredResult CumDist
1 45 0.0066 FALSE NA NA
2 76 0.4170 FALSE NA NA
3 23 0.8940 FALSE NA NA
4 2 1.2890 TRUE 1 1.2890
5 67 1.7645 TRUE 1 3.0535
6 34 2.1770 TRUE 1 4.8180
7 103 3.7475 TRUE 1 6.9950
8 34 4.2306 TRUE 1 10.7425
9 88 4.7326 TRUE 1 15.4751
10 45 5.2144 TRUE 1 20.2077
11 76 6.5687 TRUE 1 25.4221
12 23 6.9534 TRUE 1 31.9908
13 11 7.0745 TRUE 1 39.0653
14 56 7.1011 TRUE 1 46.1399
15 22 6.5430 TRUE 1 53.2410
16 23 5.0043 TRUE 1 59.7840
17 45 5.0001 TRUE 1 64.7841
18 66 4.7842 TRUE 1 69.7842
19 7 3.2200 TRUE 1 74.5684
20 8 2.0045 TRUE 1 77.7884
21 2 1.7650 TRUE 1 79.5534
22 34 0.2875 FALSE NA NA
23 31 0.0001 FALSE NA NA
24 45 0.0069 FALSE NA NA
25 77 0.0756 FALSE NA NA
26 2 0.7666 FALSE NA NA
27 34 1.0898 TRUE 2 1.0898
28 11 2.2927 TRUE 2 3.3825
29 10 2.9022 TRUE 2 5.6751
30 2 3.8636 TRUE 2 9.5387
31 45 4.0495 TRUE 2 13.4023
32 67 3.3180 TRUE 2 16.7203
33 56 2.1792 TRUE 2 20.0383
34 67 1.7837 TRUE 2 21.8221
35 44 0.7964 FALSE NA NA
36 54 0.0062 FALSE NA NA
37 23 0.0066 FALSE NA NA
38 34 1.5670 TRUE 3 1.5670
39 22 2.8854 TRUE 3 4.4524
40 12 3.5679 TRUE 3 7.3379
41 34 2.5784 TRUE 3 9.9163
42 22 1.1111 TRUE 3 12.4947
43 1 0.7853 FALSE NA NA
44 12 0.2000 FALSE NA NA
45 12 0.0005 FALSE NA NA
Once I have the desired result column, I would then like to add a second column for cumulative distance using the cumsum(df$DesiredResult)
function, but nested within the blocks of distance >1 (please see desired output).
I am really struggling to write the rules for this, so any help would be greatly appreciated. Thanks!