R: counting for each row the amount of "similar" rows that satisfy some condition

Question

Let's say I have the following data table

    x y   z
 1: 1 1 100
 2: 1 1 100
 3: 1 1 100
 4: 1 2 200
 5: 1 2 100
 6: 1 2 100
 7: 2 1 200
 8: 2 1 200
 9: 2 1 100
10: 2 2 200
11: 2 2 200
12: 2 2 200

DT = setDT(structure(list(x = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L), y = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L
), z = c(100, 100, 100, 200, 100, 100, 200, 200, 100, 200, 200, 
200)), .Names = c("x", "y", "z"), row.names = c(NA, -12L), class = "data.frame"))

I need to create a new column which counts, for each row, the ammount of times a specific given value (100) appears in column z for the specific combination of x,y (that's what I'm refering to with "similar rows"). In fact, I need to do this two times because I need to add a new column for each of the possible values in z, namely 100 and 200.

The resulting data table should look as follows:

    x y   z hundred two_hundred
 1: 1 1 100       3           0
 2: 1 1 100       3           0
 3: 1 1 100       3           0
 4: 1 2 200       2           1
 5: 1 2 100       2           1
 6: 1 2 100       2           1
 7: 2 1 200       1           2
 8: 2 1 200       1           2
 9: 2 1 100       1           2
10: 2 2 200       0           3
11: 2 2 200       0           3
12: 2 2 200       0           3

Notice that all the rows who share the same combination of x and y would also share the same values in the columns hundred and two_hundred, regardless of their value in z.

Further, I would need to get rid of the redundant rows and the column z so I would have a shorter data table.

DT <- DT %>% unique(by=c("x","y"))
DT [,z:=NULL]

The final product would be this:

   x y hundred two_hundred
1: 1 1       3           0
2: 1 2       2           1
3: 2 1       1           2
4: 2 2       0           3

Which is the easiest way to do that? Also it would be interesting if both the counting and the deletion of the redundant rows could be done with one single command.

(dplyr commands if possible, please)

I'm new in programming in R, so I would appreciate if you guys could give me a simple language response, as I'm not yet familiarized with all the most common structures and procedures in R (the transition from cycle-like Java algorithms to matrix-wise in R has been quite tough...)

Thanks in advance. This is also my first question in this site.

You're looking for `dcast(DT, x+y ~ z, fill = 0L)`, I think. — Frank, Apr 12 '16 at 00:01
I'm going to go ahead and mark this question as a "duplicate" of a similar one, but let me know if you think it's not justified or have other questions. — Frank, Apr 12 '16 at 00:04
Another option in `data.table` that might be naive - `DT[,as.list(table(factor(z,levels=unique(DT$z)))), by=list(x,y)]` — thelatemail, Apr 12 '16 at 00:04

R: counting for each row the amount of "similar" rows that satisfy some condition

0 Answers0