Filter dataframe based on aggregate count value of a column

Question

Consider a dataframe like the following:

Key | Value
A   | 1
A   | 2
B   | 2
C   | 3

I want to filter this dataframe based on the condition that only get rows with the key where the key occurs more than once.

So the expected output is

Key | Value
A   | 1
A   | 2

What's the most succinct way of doing this in R? Looking for a generalized solution where count the count can be > n

score 3 · Answer 1 · answered Oct 26 '17 at 05:44

3

We can use

library(dplyr)
df1 %>%
    group_by(Key) %>%
    filter(n()>1)

Or with base R using table and subset

subset(df1, Key %in% names(which(table(Key) > 1)))

answered Oct 26 '17 at 05:44

akrun

874,273
37
540
662

score 1 · Answer 2 · answered Oct 26 '17 at 05:50

1

Using data.table

df <- data.table(read.table(text = "Key  Value
A   1
A   2
B   2
C   3", header = T))

df[, if(.N > 1) .SD, by = Key ]

   Key Value
1:   A     1
2:   A     2

answered Oct 26 '17 at 05:50

Hardik Gupta

4,700
9
41
83

Filter dataframe based on aggregate count value of a column

2 Answers2