how to collapse data by intervals?

Question

I would like to take a data set like this:

dat <- data.frame(pos = 1:120, state = c(rep("state1", 30), rep("state2",30), rep("state3",30), rep("state1", 30)))

And collapse it into this form:

dat2 <- data.frame(start = seq(1,120,30), end = seq(30,120,30), state = c("state1","state2","state3","state1"))

In summary, I want to know the begining and end of each category as they appear in the data.

What's the output you're trying to get? Is your question not covered by [this one](https://stackoverflow.com/q/9847054/5325862)? — camille, Oct 15 '19 at 17:07
```state = c("state1","state2","state3","state1")``` in your grouping column, what's the difference between `state1` and `state1`??? — M--, Oct 15 '19 at 17:11

makeshift-programmer · Answer 1 · 2019-10-15T18:16:35.623

0

You can use group_by from dplyr for this purpose. The code is below:

require(dplyr)

dat1 = dat %>% 
  dplyr::arrange(state,pos) %>% 
  dplyr::mutate(occurence=0)

occurence = 0

for(i in 1:nrow(dat1)){
  if((i != 1) && ((dat1$pos[i] - dat1$pos[i-1])>1)){
    occurence = occurence + 1
  }
  dat1$occurence[i] = occurence
}

dat2 = dat1 %>% 
  dplyr::group_by(state,occurence) %>% 
  dplyr::summarise(start = min(pos,na.rm=T),
                   end = max(pos,na.rm=T)) %>% 
  dplyr::arrange(start)

Let me know if it works.

Output

# A tibble: 4 x 4
# Groups:   state [3]
  state  occurence start   end
  <fct>      <dbl> <int> <int>
1 state1         0     1    30
2 state2         1    31    60
3 state3         1    61    90
4 state1         1    91   120

You can remove the 'occurence' column if required. Use:

dat2 = dat2 %>% dplyr::select(-occurence)

edited Oct 15 '19 at 18:16

answered Oct 15 '19 at 16:52

makeshift-programmer

489
3
8

sorry, I realized I typed "state" instead of "state1". there should be only three levels for the variable. – Sergio.pv Oct 15 '19 at 16:56
@Sergio.pv, so in the output, you want "state1" to be present twice? – makeshift-programmer Oct 15 '19 at 16:59
exactly, I want to know what's inbetween – Sergio.pv Oct 15 '19 at 17:08
@Sergio.pv, I have added a for loop to check the 'occurence' which is essentially the occurence of a chunk of a particular ```state``` and then I grouped by both ```state``` and ```occurence```. – makeshift-programmer Oct 15 '19 at 17:28

score 0 · Accepted Answer · answered Oct 15 '19 at 17:37

using base R, you could use rle:

with( rle(as.character(dat$state)),
      data.frame(state=values,end = cumsum(head(lengths))->end,start = c(1,head(end,-1)+1)))
   state end start
1 state1  30     1
2 state2  60    31
3 state3  90    61
4 state1 120    91

how to collapse data by intervals?

2 Answers2