0

I have two requirements

  1. find all duplicate values in single column
  2. find all unique rows [opposite to first question] This should not include even single pair from duplicated rows

I'm Learning since last 2 weeks. Watching YouTube videos, Referring Stackoverflow and other websites, so not much. Please do refer if any material or courses.

so answer to my first question i found here (Find duplicated elements with dplyr)

# All duplicated elements
mtcars %>%
  filter(carb %in% unique(.[["carb"]][duplicated(.[["carb"]])]))

So i want opposite of this

Thanks

P.S. I have non technical background. I went through couple of questions and answers here, so i might have found the answer or needed some of tweaks and i totally ignored that

1S1a4m9
  • 27
  • 7
  • Do you need `mtcars %>% + filter(!(carb %in% unique(.[["carb"]][duplicated(.[["carb"]])])))` – akrun Jul 16 '19 at 13:05
  • 2
    This is quite unclear. For one thing, "find all duplicate values in single column and return the all rows [2 or more]" isn't a grammatical sentence. For another thing, you haven't provided any idea about what your data looks like. – John Coleman Jul 16 '19 at 13:05

1 Answers1

2

As you probably realised, unique and duplicated don’t quite what you need, because they essentially cause the retention of all distinct values, and just collapse “multiple copies” of such values.

For your first question, you can group_by the column that you’re interested in, and then retain just those groups (via filter) which have more than one row:

mtcars %>%
    group_by(mpg) %>%
    filter(length(mpg) > 1) %>%
    ungroup()

This example selects all rows for which the mpg value is duplicated. This works because, when applied to groups, dplyr operations such as filter work on each group individually. This means that length(mpg) in the above code will return the length of the mpg column vector of each group, separately.

To invert the logic, it’s enough to invert the filtering condition:

mtcars %>%
    group_by(mpg) %>%
    filter(length(mpg) == 1) %>%
    ungroup()
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214