Let's say I have the following data table
x y z
1: 1 1 100
2: 1 1 100
3: 1 1 100
4: 1 2 200
5: 1 2 100
6: 1 2 100
7: 2 1 200
8: 2 1 200
9: 2 1 100
10: 2 2 200
11: 2 2 200
12: 2 2 200
DT = setDT(structure(list(x = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L), y = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L
), z = c(100, 100, 100, 200, 100, 100, 200, 200, 100, 200, 200,
200)), .Names = c("x", "y", "z"), row.names = c(NA, -12L), class = "data.frame"))
I need to create a new column which counts, for each row, the ammount of times a specific given value (100) appears in column z for the specific combination of x,y (that's what I'm refering to with "similar rows"). In fact, I need to do this two times because I need to add a new column for each of the possible values in z, namely 100 and 200.
The resulting data table should look as follows:
x y z hundred two_hundred
1: 1 1 100 3 0
2: 1 1 100 3 0
3: 1 1 100 3 0
4: 1 2 200 2 1
5: 1 2 100 2 1
6: 1 2 100 2 1
7: 2 1 200 1 2
8: 2 1 200 1 2
9: 2 1 100 1 2
10: 2 2 200 0 3
11: 2 2 200 0 3
12: 2 2 200 0 3
Notice that all the rows who share the same combination of x and y would also share the same values in the columns hundred and two_hundred, regardless of their value in z.
Further, I would need to get rid of the redundant rows and the column z so I would have a shorter data table.
DT <- DT %>% unique(by=c("x","y"))
DT [,z:=NULL]
The final product would be this:
x y hundred two_hundred
1: 1 1 3 0
2: 1 2 2 1
3: 2 1 1 2
4: 2 2 0 3
Which is the easiest way to do that? Also it would be interesting if both the counting and the deletion of the redundant rows could be done with one single command.
(dplyr commands if possible, please)
I'm new in programming in R, so I would appreciate if you guys could give me a simple language response, as I'm not yet familiarized with all the most common structures and procedures in R (the transition from cycle-like Java algorithms to matrix-wise in R has been quite tough...)
Thanks in advance. This is also my first question in this site.