0

below is the sample table/Data frame. The third attribute (count) will give the count of similar rows(attribute1+attribute2)

╔════╦═════════════╦═════════════╦══════════════════════════════╗
║ ID ║ Attribute 1 ║ Attribute 2 ║ count(Attribute1+Attribute2) ║
╠════╬═════════════╬═════════════╬══════════════════════════════╣
║  1 ║ A           ║ AA          ║                            3 ║
║  2 ║ B           ║ CC          ║                            1 ║
║  3 ║ C           ║ BB          ║                            2 ║
║  4 ║ A           ║ AA          ║                            3 ║
║  5 ║ C           ║ BB          ║                            2 ║
║  6 ║ D           ║ AA          ║                            1 ║
║  7 ║ B           ║ AA          ║                            1 ║
║  8 ║ C           ║ DD          ║                            1 ║
║  9 ║ A           ║ AB          ║                            1 ║
║ 10 ║ A           ║ AA          ║                            3 ║
╚════╩═════════════╩═════════════╩══════════════════════════════╝

Update :

Thanks akrun and danas.zuokas for the help. the final output I am expecting would look something like this. where I am choosing 50% from each count group .ex : for ID 1,4,10 the count is 3. I would need to choose only 2 (50%) for each count group hence I should get (A,AA) twice .

 ID    Attribute 1  Attribute 2     count(Attribute1+Attribute2)
        1   A   AA  3
        2   B   CC  1
        3   C   BB  2
        4   A   AA  3
        6   D   AA  1
        7   B   AA  1
        8   C   DD  1
        9   A   AB  1

2 Answers2

5

Given your data is in df:

library(data.table)

dt <- as.data.table(df)
dt[, count := .N, by = list(Attribute1, Attribute2)]
danas.zuokas
  • 4,551
  • 4
  • 29
  • 39
3

We can try

library(dplyr)
df1 %>%
     group_by(attribute1, attribute2) %>%
     mutate(Count= n())
akrun
  • 874,273
  • 37
  • 540
  • 662
  • what is `%>%` ? I am unable to run teh code with that – Zahoor Kazi Jan 06 '16 at 13:04
  • @ZahoorKazi It is the pipe/chain which connect the lhs and rhs – akrun Jan 06 '16 at 13:05
  • how can Choose only few records (say 30%) for each count group? – Zahoor Kazi Jan 07 '16 at 09:02
  • @ZahoorKazi You may use `df1 %>% group_by(attribute1, attribute2) %>% slice(seq(round(0.3*n())))` – akrun Jan 07 '16 at 09:15
  • thanks akrun, but that seems to be giving some different result. what I am required to get is , the rows with ID 1,4,10 have appeared 3 times. now I need to select only 30% ( 1 row irrespetive of ID) of it. and same for other count groups – Zahoor Kazi Jan 07 '16 at 09:26
  • @ZahoorKazi Can you update your post with the expected output. It's not clear to me. Or better would be to post that as a separate question. – akrun Jan 07 '16 at 09:29