2

I have a dataset of measurements of the occupation of people. That occupation was measured every 15 minutes for a complete day, which results in a 96-letter string (e.g. ARCCCRTOHGDERRRRYYYIJ...) where each letter represents some kind of occupation.

One of the letters (C) stands for transportation from home to work or vice versa, which would allow me to separate between at home and at work.

To identify that transport, I used

Newdata<-Data%>%
mutate(transport = as.character(gregexpr(pattern="C",String)))

This results in things like:

c(31,32,33,58,59)

in which case, I would know they are at home before time 31 and after time 59. Alas, there is the possibility that people work nights, which results in:

c(44,45)

and apparently, there are people who go to work, return home and go to work again (ore vice versa)

c(7,8, 31,32, 75,76)

What I need is a way to see that in the first vector, there are 2 series of consecutive numbers, in the second vector only 1 series and in the third vector there are three series.

RHertel
  • 23,412
  • 5
  • 38
  • 64
Dries
  • 470
  • 4
  • 24

2 Answers2

3

Just use the diff function to calculate the differences between adjacent values:

R> x = c(1, 2, 4, 6, 10)
R> diff(x)
[1] 1 2 2 4

You can then use other functions to interrogate the output. For example which to determine where those differences lie

R> which(diff(x)==1)
[1] 1

or sum to add them up

sum(diff(x) == 1)
csgillespie
  • 59,189
  • 14
  • 150
  • 185
3

To detect the number of series with consecutive numbers (these numbers are always increasing in your vector), you can do:

foo <- function(x) sum(rle(diff(x))$values==1)

#> foo(c(31,32,33,58,59))
#[1] 2
#foo(c(44,45))
#[1] 1
#foo(c(7,8, 31,32, 75,76))
#[1] 3
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87