I have data which is unique at one variable Y. Another variable Z tells me how many people are in each of Y. My problem is that I want to create groups of 45 from these Y and Z. I mean that whenever the running total of Z touches 45, one group is made and the code moves on to create the next group.
My data looks something like this
ID X Y Z
1 A A 1
2 A B 5
3 A C 2
4 A D 42
5 A E 10
6 A F 2
7 A G 0
8 A H 3
9 A I 0
10 A J 8
11 A K 19
12 A L 3
13 A M 1
14 A N 1
15 A O 2
16 A P 0
17 A Q 1
18 A R 2
What is want is something like this
ID X Y Z CumSum Group
1 A A 1 1 1
2 A B 5 6 1
3 A C 2 8 1
4 A D 42 50 1
5 A E 10 10 2
6 A F 2 12 2
7 A G 0 12 2
8 A H 3 15 2
9 A I 0 15 2
10 A J 8 23 2
11 A K 19 42 2
12 A L 3 45 2
13 A M 1 1 3
14 A N 1 2 3
15 A O 2 4 3
16 A P 0 4 3
17 A Q 1 5 3
18 A R 2 7 3
Please let me know how I can achieve this with R.
EDIT: I extended the minimum reproducible example for more clarity
EDIT 2: I have one extra question on this topic. What if, the variable X
which is A
only right now is also changing. For example, it can be B
for a while then can go to being C
. How can I prevent the code from generating groups that are not within two categories of X
. For example if Group = 3
, then how can I make sure that 3 is not in category A
and B
?