count of matching rows in R

Question

below is the sample table/Data frame. The third attribute (count) will give the count of similar rows(attribute1+attribute2)

╔════╦═════════════╦═════════════╦══════════════════════════════╗
║ ID ║ Attribute 1 ║ Attribute 2 ║ count(Attribute1+Attribute2) ║
╠════╬═════════════╬═════════════╬══════════════════════════════╣
║  1 ║ A           ║ AA          ║                            3 ║
║  2 ║ B           ║ CC          ║                            1 ║
║  3 ║ C           ║ BB          ║                            2 ║
║  4 ║ A           ║ AA          ║                            3 ║
║  5 ║ C           ║ BB          ║                            2 ║
║  6 ║ D           ║ AA          ║                            1 ║
║  7 ║ B           ║ AA          ║                            1 ║
║  8 ║ C           ║ DD          ║                            1 ║
║  9 ║ A           ║ AB          ║                            1 ║
║ 10 ║ A           ║ AA          ║                            3 ║
╚════╩═════════════╩═════════════╩══════════════════════════════╝

Update :

Thanks akrun and danas.zuokas for the help. the final output I am expecting would look something like this. where I am choosing 50% from each count group .ex : for ID 1,4,10 the count is 3. I would need to choose only 2 (50%) for each count group hence I should get (A,AA) twice .

 ID    Attribute 1  Attribute 2     count(Attribute1+Attribute2)
        1   A   AA  3
        2   B   CC  1
        3   C   BB  2
        4   A   AA  3
        6   D   AA  1
        7   B   AA  1
        8   C   DD  1
        9   A   AB  1

the 4th column is the count of `attribute1 and attribute 2` together — Zahoor Kazi, Jan 06 '16 at 12:53
In the future use `dput` or some other more amenable means of sharing your data. "Pretty" ASCII tables cannot be easily be transferred into others' R sessions. — nrussell, Jan 06 '16 at 13:03

danas.zuokas · Accepted Answer · 2016-01-06T13:01:01.180

5

Given your data is in df:

library(data.table)

dt <- as.data.table(df)
dt[, count := .N, by = list(Attribute1, Attribute2)]

edited Jan 06 '16 at 13:01

answered Jan 06 '16 at 12:53

danas.zuokas

4,551
4
29
39

1

Any reason you're `paste`ing the columns? I assume this will take longer than just grouping by both..? – talat Jan 06 '16 at 12:57
You are right - just "translated" from SQL. – danas.zuokas Jan 06 '16 at 13:00
You are welcome Zahoor Kazi! You can also mark the answer as accepted. – danas.zuokas Jan 06 '16 at 13:03

score 3 · Answer 2 · answered Jan 06 '16 at 12:53

3

We can try

library(dplyr)
df1 %>%
     group_by(attribute1, attribute2) %>%
     mutate(Count= n())

answered Jan 06 '16 at 12:53

akrun

874,273
37
540
662

what is `%>%` ? I am unable to run teh code with that – Zahoor Kazi Jan 06 '16 at 13:04
@ZahoorKazi It is the pipe/chain which connect the lhs and rhs – akrun Jan 06 '16 at 13:05
how can Choose only few records (say 30%) for each count group? – Zahoor Kazi Jan 07 '16 at 09:02
@ZahoorKazi You may use `df1 %>% group_by(attribute1, attribute2) %>% slice(seq(round(0.3*n())))` – akrun Jan 07 '16 at 09:15
thanks akrun, but that seems to be giving some different result. what I am required to get is , the rows with ID 1,4,10 have appeared 3 times. now I need to select only 30% ( 1 row irrespetive of ID) of it. and same for other count groups – Zahoor Kazi Jan 07 '16 at 09:26
@ZahoorKazi Can you update your post with the expected output. It's not clear to me. Or better would be to post that as a separate question. – akrun Jan 07 '16 at 09:29

count of matching rows in R

2 Answers2