getting a count of how many times a value in a column is duplicated

Question

I have a data frame in rstudio that is like the following example

address = c("123 fake st", "124 fake st", "125 fake st", "126 fake st", "123 jerry st", "124 road rd",
           " 125 tiny rd"," 126 cool r")
name = c("joey", "rachel", "ross", "chandler", "monika", "joey", "ross", "ross")
other = c(1, 1, 1, 2, 2, 3, 4, 4)
df <-data.frame(address,name, other

)

This represents a dataset that is basically a series of addresses, the owner names, and a bunch of other columns I need to keep. I want to county how many times the owner shows up, in it's own column, so that it looks like this:

       address     name other  count
1  123 fake st     joey     1     2
2  124 fake st   rachel     1     1
3  125 fake st     ross     1     3
4  126 fake st chandler     2     1
5 123 jerry st   monika     2     1
6  124 road rd     joey     3     2
7  125 tiny rd     ross     4     3
8   126 cool r     ross     4     3

Based on some other solutions, I tried this and got the following error. Any advice?

df$count <- group_size(group_by(df,name))
Error in `$<-.data.frame`(`*tmp*`, count, value = c(1L, 2L, 1L, 1L, 3L : 
replacement has 5 rows, data has 8

I thought maybe it was because if there wasn't a duplicate then it didn't have anything to put in the count column for the value and tried an (extremely inelegant) ifelse, and got the same problem:

> df$count <- ifelse((group_size(group_by(df,name))), (group_size(group_by(df,name))), 1)
Error in `$<-.data.frame`(`*tmp*`, count, value = c(1L, 2L, 1L, 1L, 3L : 
  replacement has 5 rows, data has 8

Any advice?

Whoops, my bad; yes and edited in the original post – tchoup Jul 13 '21 at 23:03 — tchoup, Jul 13 '21 at 23:03
Reopened as the issue is about `group_size` output – akrun Jul 14 '21 at 17:52 — akrun, Jul 14 '21 at 17:52

akrun · Accepted Answer · 2021-07-13T23:16:18.187

3

If we need to create a count column, use add_count

df %>% 
   add_count(name, name = "new_count")

-output

      address     name other count  new_count
1  123 fake st     joey     1     2 2
2  124 fake st   rachel     1     1 1
3  125 fake st     ross     1     3 3
4  126 fake st chandler     2     1 1
5 123 jerry st   monika     2     1 1
6  124 road rd     joey     3     2 2
7  125 tiny rd     ross     4     3 3
8   126 cool r     ross     4     3 3

group_size returns only the summary count

group_size(group_by(df,name))
[1] 1 2 1 1 3

edited Jul 13 '21 at 23:16

answered Jul 13 '21 at 23:01

akrun

874,273
37
540
662

1

Sorry, clicked enter too soon; it works on my trial dataset but not on my big one, where I get- Error in set(x, j = name, value = value) : Supplied 159 items to be assigned to 3787 items of column 'mail_name_dupe'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code. – tchoup Jul 13 '21 at 23:14
@tchoup i think that is an error because the same column may exist in your ddata i.e. 'n'. Try with specifying `name = "new_n"` or `name = "new_name"` as in the update – akrun Jul 13 '21 at 23:16

blorp6 · Answer 2 · 2021-07-13T23:15:48.120

I am assuming that you are using the dplyr package, since you use the functions group_by() and group_size() in your code.

Here is the most direct way to get the answer to "how many times is each name repeated?":

library(dplyr)

summarise(group_by(df, name), repeats = n())

If you wanted to use the magrittr pipe operator, you can also write it as:

library(dplyr)
library(magrittr)

df %>% 
  group_by(name) %>% 
  summarise(repeats = n())

Both of these will output:

# A tibble: 5 x 2
  name          repeats
  <chr>           <int>
1 chandler            1
2 joey                2
3 monika              1
4 rachel              1
5 ross                3

score 0 · Answer 3 · answered Jul 13 '21 at 23:21

You could also add the count column as follows:

df$count <- table(df$name)[df$name]
df

#        address     name other count
# 1  123 fake st     joey     1     2
# 2  124 fake st   rachel     1     1
# 3  125 fake st     ross     1     3
# 4  126 fake st chandler     2     1
# 5 123 jerry st   monika     2     1
# 6  124 road rd     joey     3     2
# 7  125 tiny rd     ross     4     3
# 8   126 cool r     ross     4     3

getting a count of how many times a value in a column is duplicated

3 Answers3