1

I'm having issues removing just the right amount of information from the following data:

18,14,17,2,9,8
17,17,17,14
18,14,17,2,1,1,1,1,9,8,1,1,1

I'm applying !duplicate in order to remove the duplicates.

SplitFunction <- function(x) {
  b <- unlist(strsplit(x, '[,]'))
  c <- b[!duplicated(b)]
  return(paste(c, collapse=","))
}

I'm having issues removing only consecutive duplicates. The result below is what I'm getting.

18,14,17,2,9,8
17,14
18,14,17,2,1,9,8

The data below is what I want to obtain.

18,14,17,2,9,8
17,14
18,14,17,2,1,9,8,1

Can you suggest a way to perform this? Ideally a vectorized approach...
Thanks,
Miguel

mik
  • 356
  • 1
  • 9
  • 24
  • 1
    please provide a reproducible sample of your data (using `dput()`) – Sotos Aug 17 '16 at 09:22
  • c("18,14,17,2,9,8", "17,17,17,14", "14,17,18,2,9,8,1", "18,14,17,11,8,9,8,8,22,13,6", "14,17,2,9,8", "18,14,17,2,1,1,1,1,1,1,1,1,9,8,1,1,1,1") This is the head of the data. It is a column in a data.table, but it can be changed to other format. – mik Aug 17 '16 at 09:25

3 Answers3

5

you can use rle function to sovle this question.

xx <- c("18,14,17,2,9,8","17,17,17,14","18,14,17,2,1,1,1,1,9,8,1,1,1")
zz <- strsplit(xx,",")
sapply(zz,function(x) rle(x)$value)

And you can refer to this link. How to remove/collapse consecutive duplicate values in sequence in R?

Community
  • 1
  • 1
chunjin
  • 240
  • 1
  • 8
3

We can use rle

sapply(strsplit(x, ','), function(x) paste(inverse.rle(within.list(rle(x), 
            lengths <- rep(1, length(lengths)))), collapse=","))
#[1] "18,14,17,2,9,8"     "17,14"              "18,14,17,2,1,9,8,1"

data

x <- c('18,14,17,2,9,8', '17,17,17,14', '18,14,17,2,1,1,1,1,9,8,1,1,1')
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 5
    Instead of `inverse.rle`, you could just extract the "values" element -- `sapply(strsplit(x, ','), function(X) paste(rle(X)$values, collapse = ","))` – alexis_laz Aug 17 '16 at 09:32
2

Great rle-answers. This is just to add an alternative without rle. This gives a list of numeric vectors but can of course easily expanded to return strings:

numbers <- c("18,14,17,2,9,8", "17,17,17,14", "14,17,18,2,9,8,1", "18,14,17,11,8,9,8,8,22,13,6", "14,17,2,9,8", "18,14,17,2,1,1,1,1,1,1,1,1,9,8,1,1,1,1") 
result <- sapply(strsplit(numbers, ","), function(x) x[x!=c(x[-1],Inf)])
print(result)
Bernhard
  • 4,272
  • 1
  • 13
  • 23