I need help with R, similar to question filtering-a-dataframe-showing-only-duplicates I wish to extract duplicates from a dataframe with over 2,000 entries.
The first 15 rows of data looks like this:
run | id | Diff |
---|---|---|
1 | 20 | 0 |
1 | 4 | 1024 |
1 | 4 | 1 |
1 | 4 | 1 |
1 | 4 | 65 |
1 | 4 | 1 |
1 | 4 | 1 |
1 | 11 | 475 |
1 | 11 | 1 |
1 | 11 | 1 |
2 | 25 | 0 |
2 | 18 | 0 |
2 | 18 | 1 |
2 | 18 | 1 |
2 | 18 | 1 |
I wish to extract only the duplicates, i.e.
run | id | Diff |
---|---|---|
1 | 4 | 1024 |
1 | 4 | 1 |
1 | 4 | 1 |
1 | 4 | 65 |
1 | 4 | 1 |
1 | 4 | 1 |
1 | 11 | 475 |
1 | 11 | 1 |
1 | 11 | 1 |
2 | 18 | 0 |
2 | 18 | 1 |
2 | 18 | 1 |
2 | 18 | 1 |
Using the command
mydata_extract %>% group_by(id) %>% filter(n() > 1)
does not extract the data, in fact I get the complete set of data returned. Is there something about "filter(n() > 1)" that I need to change? I'm a beginner with R.
Sorry my data table is not formatting correctly, it looks okay in preview!
I will also want to group my data first by "run"