0

I have a dataset which has two columns matchid and roundnumber which looks like:

matchid    roundnumber
1522380841   1
1522380841   2
1522380841   1
1522380841   3
1522380841   2
1522380841   1
1522380841   1
1522380842   2
1522380842   2
1522380842   3
1522380842   1
1522380842   4
1522380842   1

I cannot figure out how to count the total number of times a single matchid and roundnumber should exist. For this example, the output should be:

count (matchid)
2

I think it needs a unique constraint perhaps? For each matchid, there can be duplicate values of roundnumber, but I need to count them just once. I just need to find out how many unique matchid exists.

I tried using dplyr:

library(dplyr)
count(r6,var=r6$matchid,r6$roundnumber)

but I don't think it works correctly.

  • Are you trying to find the count per `matchid` + `roundnumber` combination? Shouldn't your output have `roundnumber` in it? – acylam Oct 02 '19 at 15:07
  • you want something like this: table(paste(r6$matchid, r6$roundnumber, sep = '_')? – glagla Oct 02 '19 at 15:08
  • I am just trying to find out how many matches were played. Each unique `matchid` represents a match, but `roundnumber` can be duplicate for each `matchid`. I am not sure how to do that, or if I am asking the right question. – Saksham Chawla Oct 02 '19 at 15:15
  • `length(unique(r6$matchid))`? Not sure why you need `roundnumber` if you're only counting the number of unique matches – acylam Oct 02 '19 at 15:22
  • @avid_useR I am new to R and I still don't know the syntax That answers my question. Out of curiosity, if I were to find out how many unique rounds were played in each match, how would I do that? – Saksham Chawla Oct 02 '19 at 15:30
  • try it: freq <- r6 %>% group_by(matchid,roundnumber) %>% slice(1) %>% ungroup %>% group_by(matchid) %>% summarise(n()) – Fateta Oct 02 '19 at 15:32
  • @Fateta woah, that works. I don't understand the code a bit, but it works! Thanks! – Saksham Chawla Oct 02 '19 at 15:35
  • Just write: `r6 %>% group_by(matchid) %>% summarize(n_distinct(roundnumber))` – acylam Oct 02 '19 at 15:40

3 Answers3

1

I think that the table function is what you are looking for:

table(r6$matchid)

for example:

letters = c('a', 'a', 'a', 'b', 'b', 'a', 'c')
table(letters)

then changing it to a dataframe can be convenient:

data.frame(table(letters))
glagla
  • 611
  • 4
  • 9
0

If you insist on a dplyr solution

letters = c('a', 'a', 'a', 'b', 'b', 'a', 'c') 

library(dplyr)
df <- data.frame(letters)
df %>% group_by(letters) %>% summarise(n())

# A tibble: 3 x 2
  letters `n()`
  <fct>   <int>
1 a           4
2 b           2
3 c           1
Kozolovska
  • 1,090
  • 6
  • 14
0

using data.table package is really easy:

library(data.table)
# asuming your dataset is named "df"
df <- data.table(df)
df <- df[, list(count=.N), by=matchid] 

Should give you :

head(df)
matchid   count
1522380841 7
1522380842 6
.
.
COLO
  • 1,024
  • 16
  • 28