-2

I have a data.frame and I want to list all records which have duplicates in columns "bod" and "datum". There is a very nice function duplicated, unfortunatelly, it shows just one of the records:

visits2[duplicated(visits2[,c('bod','datum')]),]

Such handy function should be able to list all of the duplicates, shouldn't it? Or does R have a different handy function for that?

The only thing I was able to come up with is to call it twice like this, but that's pretty clumsy, so I consider it just a workaround:

visits2[duplicated(visits2[,c('bod','datum')]) | duplicated(visits2[,c('bod','datum')], fromLast = TRUE),]

I feel that for such a common task R deserves better solution than that! :-)

PS: please don't post answers writing "your own" functions for that... I know it can be done ;-) That's not the point. Perhaps the best thing would be to add a new option to duplicated().

Tomas
  • 57,621
  • 49
  • 238
  • 373
  • 1
    People have been scratching there heads about this before - some alternatives here: [Finding ALL duplicate rows, including “elements with smaller subscripts”](https://stackoverflow.com/questions/7854433/finding-all-duplicate-rows-including-elements-with-smaller-subscripts) – Henrik Mar 01 '21 at 14:23
  • If you want to show only duplicates, why not just drop the rows with unique values? – Wimpel Mar 01 '21 at 14:43
  • Another thread, but with the reverse question stackoverflow.com/q/35832931/10276092 maybe modified like: x[x$y %in% x[duplicated(x$y), "y"], ] – M.Viking Mar 01 '21 at 14:49
  • Thanks very much @Henrik! I see they came to the same workaround as I found. I'd like to push it a bit further :-) – Tomas Mar 01 '21 at 15:16
  • I think you need to clarify what you mean by "push it a bit further", otherwise the question may be considered a duplicate of the link I posted. Cheers – Henrik Mar 01 '21 at 16:43
  • I already wrote it in the question. – Tomas Mar 01 '21 at 19:18

1 Answers1

2

Here might be one workaround

subset(
  visits2,
  ave(1:nrow(visits2),
    interaction(visits2[, c("bod", "datum")]),
    FUN = length
  ) > 1
)
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81