In my reprex below:
RSA
is the output of a process that is to be analyzed and whose results is to be grouped.- Each
RSA
group has varying range of days (datenum
) each is observed. var1
varies less frequently, but each is observed for the same 8 days consecutively.- The
RSA
groups are to be numbered sequentially within thevar1
group; when a newvar1
is encountered theRSA
group numbering begins anew. idx_objective
is the index that I am looking for.
Reprex:
var1 <- c("aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "ccc", "ccc", "ccc", "ccc", "ccc", "ccc", "ccc", "ccc", "ddd", "ddd", "ddd", "ddd", "ddd", "ddd", "ddd", "ddd")
RSA <- c(1,1,1,0,-1,-1,0,-1,
0,0,0,-1,-1,-1,1,1,
-1,-1,0,1,1,-1,-1,1,
1,-1,-1,1,1,0,-1,1)
idx_objective <- c(1,1,1,2,3,3,4,5,
1,1,1,2,2,2,3,3,
1,1,2,3,3,4,4,5,
1,2,2,3,3,4,5,6)
objective.df <- data.frame(var1, RSA, idx_objective) %>%
group_by(var1) %>%
mutate (datenum = 1:n()) %>%
relocate (datenum, .after = var1)
I have reviewed many SO posts that appear to be similar...
1dplyr: group variables then assign unique names based on unique grouping
revolves around correct use of cumsum, which I think I am using correctly
[https://stackoverflow.com/questions/40519129/how-to-assign-unique-id-for-group-of-duplicates]
[2]How to divide between groups of rows using dplyr
The last two don't seem applicable; two others referenced in the following:
Approach #1: using a change flag and cumsum
objective.try1 <- objective.df %>%
group_by(var1) %>%
mutate(chg_flg = ifelse(lag(RSA) != RSA, 1, 0) %>%
coalesce(0)) %>%
relocate(chg_flg, .after = RSA) %>%
relocate (datenum, .after var1) %>%
group_by(var1, chg_flg) %>%
mutate (idx_objective_try = cumsum(chg_flg) +1) %>%
Results:
objective.try1 <- c(1, 1, 1, 2, 3, 1, 4, 5, 1, 1, 1, 2, 1, 1, 3, 1, 1, 2, 3, 1, 4, 1, 5, 1, 2, 1, 3, 1, 4, 5, 6)
objective.df <- data.frame(var1, RSA, idx_objective, objective.try1 %>%
group_by(var1) %>%
mutate (datenum = 1: n()) %>%
relocate(datenum, .after = var1)
Observation for objective.try1
: rows 1-5 work, but row 6 incorrectly restarts the idx
numbering over again, but then resumes correctly reflecting the chg_flg
until rows 13 and 14 at which time the idx
numbering is again incorrectly restarted, but then resumes again being correct for one row until being incorrect again at rows 16, 21, 23, 27, and 29.
Following the logic at row 6, for example -- the previous idx_objective_try
(row 5) is 3 and the chg_flg
value at row 6 is zero, so the idx_objecitve_try
ought to be the correct value of 3. Why isn't it?
Approach #2: Using match
and duplicated
:
objective.try2 <- objective.df %>%
group_by(var1) %>%. # var1 corresponds to "prop" in the SO post (both the slower moving variables)
mutate(well_rep1 = match(RSA, unique(RSA)), # "RSA" corresponds to "well" in the SO post (both the faster changing variables)
well_rep2 = cumsum(!duplicated(RSA))) # approach similar to above
Observation for objective.try2
: most rows work, but there again are rows that do not work, though the rows that don't work are different from those in the first try.
I would appreciate it if someone would point out what I am doing wrong.