R how to take one data point from a group of identical data points

Question

First, I am new to R, so I am not completely familiar with the syntax of the language -- I have a list of data, and for example we can say it looks like this:

1,1,1,1,1,2,2,2,3,3,3,2,2,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,6,6,5,6,5,7,7,7,7

What I want to do is create a new list with only one entry per group of identical data, so:

1,2,3,2,3,4,5,6,5,6,5,7 (approximately).

I am not quite sure how to go about this. Note that values may not be integers. Also, if anyone has any ideas for doing the same thing with strings or timestamps, suggestions would be appreciated! So far I am trying to thing about it in terms of indexing but I am having trouble getting it down.

It will help if you give an example of what your actual dataset looks like (see how to make [reproducible example](http://stackoverflow.com/a/5963610/2461552) ). If you have a variable that uniquely represents each group in addition the vector you show, you should easily be able to remove duplicates by group with `duplicated`. — aosmith, Sep 25 '14 at 14:49
So each variable is unique, but not each group. So all of the sample data points are the same variable. Also, I don't know about using duplicated because I may have measurements later that are equal to one prior, but are also different samples. This means that the specific value must be included into the new list. — James, Sep 25 '14 at 14:54
Please be precise with terminology. I find it quite unlikely that your data is in a list. It's most likely a vector. Also, what kind of operator is `~=`? It's not part of the R language and neither is `skip`. — Roland, Sep 25 '14 at 14:57

score 4 · Accepted Answer · edited Sep 25 '14 at 14:57

Looks like you need the function rle. If x is your vector of values then rle(x)$values will give you want you want.

values <- c(1,1,1,1,1,2,2,2,3,3,3,2,2,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,6,6,5,6,5,7,7,7,7)
rle(values)$values

## [1] 1 2 3 2 3 4 5 6 5 6 5 7

values <- as.character(values)
rle(values)$values

## [1] "1" "2" "3" "2" "3" "4" "5" "6" "5" "6" "5" "7"

ts <- Sys.time()
stamps <- sort(rep(c(ts, ts+1, ts+2, ts+3), 5))

##  [1] "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:29 EDT"
##  [4] "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:30 EDT"
##  [7] "2014-09-25 10:55:30 EDT" "2014-09-25 10:55:30 EDT" "2014-09-25 10:55:30 EDT"
## [10] "2014-09-25 10:55:30 EDT" "2014-09-25 10:55:31 EDT" "2014-09-25 10:55:31 EDT"
## [13] "2014-09-25 10:55:31 EDT" "2014-09-25 10:55:31 EDT" "2014-09-25 10:55:31 EDT"
## [16] "2014-09-25 10:55:32 EDT" "2014-09-25 10:55:32 EDT" "2014-09-25 10:55:32 EDT"
## [19] "2014-09-25 10:55:32 EDT" "2014-09-25 10:55:32 EDT"

as.POSIXct(rle(as.numeric(stamps))$values, origin = '1970-01-01')

## [1] "2014-09-25 10:55:29 EDT" "2014-09-25 10:55:30 EDT" "2014-09-25 10:55:31 EDT"
## [4] "2014-09-25 10:55:32 EDT"

Hey that's pretty nifty! This is what I was originally looking for, I will adjust the edit the original post to elaborate on what else I am trying to do. Thanks — James, Sep 25 '14 at 14:59
@James, Please be careful so that your question doesn't become a moving target. — Henrik, Sep 25 '14 at 15:09

R how to take one data point from a group of identical data points

1 Answers1