5

I specifically started to think in this problem trying to get the values form a vector that were not repeated. unique is not good (up to what I could collect from the documentation) because it gives you repeated elements, but only once. duplicated has the same problem since it gives you FALSE the first time it finds a value that is duplicated. This was my workaround

> d=c(1,2,4,3,4,6,7,8,5,10,3)
> setdiff(d,unique(d[duplicated(d)]))
[1]  1  2  6  7  8  5 10

The following is a more general approach

> table(d)->g
> as.numeric(names(g[g==1]))
[1]  1  2  5  6  7  8 10

which we can generalize to other value than 1. But I find this solution a bit clumsy, transforming strings to numbers. Is there a better or more straightforward way to get this vector?

  • 3
    I think that out of all the proposed answers, your `table` one is the least clumsy one. Efficient, less code, no external packages required. – David Arenburg Sep 30 '14 at 15:17

5 Answers5

4

You could sort the values, then use rle to get the values that appear n times consecutively.

rl <- rle(sort(d))

rl$values[rl$lengths==1]
## [1]  1  2  5  6  7  8 10

rl$values[rl$lengths==2]
## [1] 3 4
James Trimble
  • 1,868
  • 13
  • 20
3

You could also do something like this in base R.

as.numeric(levels(factor(d))[tabulate(factor(d)) == 1])
# [1]  1  2  5  6  7  8 10

I've used factor and levels to make the approach more general (so "d" can include negative values and 0s).


Of course, even for something like this, you can expect a performance boost from "data.table", with which you can do something like:

library(data.table)
as.data.table(d)[, .N, by = d][N == 1]$d
# [1]  1  2  6  7  8  5 10
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
2

The one liner here is completely unnecessary but one-liners are always nice

Say you want to find all the elements that happen 2 times. Using the plyr package:

count(d)$x[count(d)$freq==2]
#Output
#[1] 3 4
DMT
  • 1,577
  • 10
  • 16
1

You can use duplicated for n=1, just call it twice and use the fromLast argument.

sort(d[! (duplicated(d) | duplicated(d, fromLast=TRUE))])
# [1]  1  2  5  6  7  8 10
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
1

I prefer the other answers, but this seemed like a good excuse to test my skills with dplyr:

library(dplyr)
as.data.frame(table(d)) %>%
  filter(Freq == 1) %>%
  select(d)
---
   d
1  1
2  2
3  5
4  6
5  7
6  8
7 10
Chase
  • 67,710
  • 18
  • 144
  • 161