2

I have a data frame df1

df1:

    a  c
1:  1  6
2:  2  8
3:  3  1
4: 45  3
5:  2  8

I need to find duplicate count of rows but also keeping the duplicate rows.The result should be like:

    a  c count
1:  1  6   1
2:  2  8   2
3:  3  1   1
4: 45  3   1  
5:  2  8   2

as rows 2 and 5 are duplicates.But I am only able to get the solution that would give the answer

    a  c count
1:  1  6   1
2:  2  8   2
3:  3  1   1
4: 45  3   1  

by doing

 df1<-data.table(df1)    
 df1[, .N, by = list(a,c)]

How could I get the desired result?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
user3171906
  • 543
  • 2
  • 9
  • 17
  • 3
    You're basically there.... `dt[ , count := .N , by=list(a,c) ]` – Simon O'Hanlon Apr 08 '14 at 19:45
  • Hi, Take a bit of time and read the tag excerpt before tagging. [tag:dataframes] is for pandas, whereas you need [tag:data.frame] here. Be careful the next time. See this meta post. [Warn \[r\] users from adding \[dataframes\] tag instead of \[data.frame\] tag](http://meta.stackoverflow.com/q/318933) – Bhargav Rao Mar 14 '16 at 15:05

2 Answers2

3

You may also do it in base R:

df1$count <- with(df1, ave(a, list(a, c), FUN = length))

df1
#     a c count
# 1:  1 6     1
# 2:  2 8     2
# 3:  3 1     1
# 4: 45 3     1
# 5:  2 8     2
Henrik
  • 65,555
  • 14
  • 143
  • 159
3

For completeness, here's a way with dplyr

df <- data.frame(
  a = c(1, 2, 3, 45, 2),
  c = c(6, 8, 1, 3, 8)
)

library(dplyr)

df %.% group_by(a, c) %.% mutate(count = n())

## Source: local data frame [5 x 3]
## Groups: a, c
## 
##    a c count
## 1  1 6     1
## 2  2 8     2
## 3  3 1     1
## 4 45 3     1
## 5  2 8     2
hadley
  • 102,019
  • 32
  • 183
  • 245
  • 2
    From `?n`: "This function is implemented special for each data source and **can only be used from within summarise** (`dplyr` 0.1.3). But you are @hadley and can use it wherever you want! ;) +1 for magic. – Henrik Apr 08 '14 at 20:35
  • @Henrik hmmm, that should really say inside summarise, mutate, or filter – hadley Apr 09 '14 at 01:41