0

First, I am new to R, so I am not completely familiar with the syntax of the language -- I have a list of data, and for example we can say it looks like this:

1,1,1,1,1,2,2,2,3,3,3,2,2,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,6,6,5,6,5,7,7,7,7

What I want to do is create a new list with only one entry per group of identical data, so:

1,2,3,2,3,4,5,6,5,6,5,7 (approximately).

I am not quite sure how to go about this. Note that values may not be integers. Also, if anyone has any ideas for doing the same thing with strings or timestamps, suggestions would be appreciated! So far I am trying to thing about it in terms of indexing but I am having trouble getting it down.

Roland
  • 127,288
  • 10
  • 191
  • 288
James
  • 699
  • 8
  • 13
  • It will help if you give an example of what your actual dataset looks like (see how to make [reproducible example](http://stackoverflow.com/a/5963610/2461552) ). If you have a variable that uniquely represents each group in addition the vector you show, you should easily be able to remove duplicates by group with `duplicated`. – aosmith Sep 25 '14 at 14:49
  • So each variable is unique, but not each group. So all of the sample data points are the same variable. Also, I don't know about using duplicated because I may have measurements later that are equal to one prior, but are also different samples. This means that the specific value must be included into the new list. – James Sep 25 '14 at 14:54
  • Please be precise with terminology. I find it quite unlikely that your data is in a list. It's most likely a vector. Also, what kind of operator is `~=`? It's not part of the R language and neither is `skip`. – Roland Sep 25 '14 at 14:57

1 Answers1

4

Looks like you need the function rle. If x is your vector of values then rle(x)$values will give you want you want.

values <- c(1,1,1,1,1,2,2,2,3,3,3,2,2,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,6,6,5,6,5,7,7,7,7)
rle(values)$values

## [1] 1 2 3 2 3 4 5 6 5 6 5 7

values <- as.character(values)
rle(values)$values

## [1] "1" "2" "3" "2" "3" "4" "5" "6" "5" "6" "5" "7"

ts <- Sys.time()
stamps <- sort(rep(c(ts, ts+1, ts+2, ts+3), 5))

##  [1] "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:29 EDT"
##  [4] "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:30 EDT"
##  [7] "2014-09-25 10:55:30 EDT" "2014-09-25 10:55:30 EDT" "2014-09-25 10:55:30 EDT"
## [10] "2014-09-25 10:55:30 EDT" "2014-09-25 10:55:31 EDT" "2014-09-25 10:55:31 EDT"
## [13] "2014-09-25 10:55:31 EDT" "2014-09-25 10:55:31 EDT" "2014-09-25 10:55:31 EDT"
## [16] "2014-09-25 10:55:32 EDT" "2014-09-25 10:55:32 EDT" "2014-09-25 10:55:32 EDT"
## [19] "2014-09-25 10:55:32 EDT" "2014-09-25 10:55:32 EDT"

as.POSIXct(rle(as.numeric(stamps))$values, origin = '1970-01-01')

## [1] "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:30 EDT" "2014-09-25 10:55:31 EDT"
## [4] "2014-09-25 10:55:32 EDT"
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
davep
  • 176
  • 2
  • 3
  • Hey that's pretty nifty! This is what I was originally looking for, I will adjust the edit the original post to elaborate on what else I am trying to do. Thanks – James Sep 25 '14 at 14:59
  • 1
    @James, Please be careful so that your question doesn't become a moving target. – Henrik Sep 25 '14 at 15:09