How to start a new count when the value of a categorical variable has changed in R

Question

Sample of mydataset

 datatrain=structure(list(DELT = c(10266L, 10266L, 10266L, 9635L, 9635L, 
    9635L, 10334L, 10334L, 10061L, 10061L, 10061L, 9512L, 9512L, 
    9512L, 10394L, 10394L, 9631L, 10376L, 10376L, 10376L, 10376L, 
    10046L, 9678L, 10332L, 10332L, 9985L, 9850L, 9850L, 10074L, 9746L, 
    9746L), EP_OBJECTID = c(86913544L, 86913544L, 86913544L, 86913544L, 
    86913544L, 86913544L, 86913544L, 86913544L, 86913544L, 86913544L, 
    86913544L, 86913544L, 86913544L, 86913544L, 86913544L, 86913544L, 
    86913544L, 86913544L, 86913544L, 86913544L, 90093693L, 90093693L, 
    90093693L, 90093693L, 90093693L, 90093693L, 90093693L, 90093693L, 
    90093693L, 90093693L, 90093693L), DELTDMR = c(0L, 0L, 0L, 8L, 
    8L, 8L, 0L, 0L, 2L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 65L, 65L, 
    65L, 65L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 3L, 3L)), class = "data.frame", row.names = c(NA, 
    -31L))

EP_OBJECTID is categorical variable. Here only two categories

86913544
90093693

how to set the sequence number from the first value of the category 86913544 to the last number before the new category 90093693 in range 10000? i.e. result

DELT    EP_OBJECTID DELTDMR
10000   86913544    0
20000   86913544    0
30000   86913544    0
40000   86913544    8
50000   86913544    8
60000   86913544    8
70000   86913544    0
80000   86913544    0
90000   86913544    2
100000  86913544    2
110000  86913544    2
120000  86913544    0
130000  86913544    0
140000  86913544    0
150000  86913544    0
160000  86913544    0
170000  86913544    0
180000  86913544    65
190000  86913544    65
200000  86913544    65

And how then to remove the last value of category =86913544 (= 65) which goes before the first value of the new category. In this example, the new category 90093693 also has a value of 65, but new counting begin from second value of category 90093693(=0). Also in range 10000

I.E. result

DELT    EP_OBJECTID DELTDMR
10000   86913544    0
20000   86913544    0
30000   86913544    0
40000   86913544    8
50000   86913544    8
60000   86913544    8
70000   86913544    0
80000   86913544    0
90000   86913544    2
100000  86913544    2
110000  86913544    2
120000  86913544    0
130000  86913544    0
140000  86913544    0
150000  86913544    0
160000  86913544    0
170000  86913544    0
180000  86913544    65
190000  86913544    65
200000  86913544    65

10000 90093693 0

20000   90093693    0
30000   90093693    0
40000   90093693    0
50000   90093693    0
60000   90093693    0
70000   90093693    0
80000   90093693    0
90000   90093693    3
100000  90093693    3

and this for each category. How to perform it?

score 1 · Accepted Answer · answered Feb 10 '21 at 11:44

Remove the rows where EP_OBJECTID is not equal to previous value of EP_OBJECTID and DELTDMR is equal to previous value of DELTDMR. For each EP_OBJECTID create a sequence which starts at 10000 with a step of 10000.

library(dplyr)

datatrain %>%
  filter(!(EP_OBJECTID != lag(EP_OBJECTID) & DELTDMR == lag(DELTDMR))) %>%
  group_by(EP_OBJECTID) %>%
  mutate(DELT = seq(10000, length.out = n(), by = 10000))

This returns :

#     DELT EP_OBJECTID DELTDMR
#1   10000    86913544       0
#2   20000    86913544       0
#3   30000    86913544       8
#4   40000    86913544       8
#5   50000    86913544       8
#6   60000    86913544       0
#7   70000    86913544       0
#8   80000    86913544       2
#9   90000    86913544       2
#10 100000    86913544       2
#11 110000    86913544       0
#12 120000    86913544       0
#13 130000    86913544       0
#14 140000    86913544       0
#15 150000    86913544       0
#16 160000    86913544       0
#17 170000    86913544      65
#18 180000    86913544      65
#19 190000    86913544      65
#20  10000    90093693       0
#21  20000    90093693       0
#22  30000    90093693       0
#23  40000    90093693       0
#24  50000    90093693       0
#25  60000    90093693       0
#26  70000    90093693       0
#27  80000    90093693       0
#28  90000    90093693       3
#29 100000    90093693       3

How to start a new count when the value of a categorical variable has changed in R

1 Answers1