0

I have a dataframe as below. The first column is positions and the last is level. I want to output the range of rows with the same number in adjacent rows. Anything with '2' should be ignored. Can you help

The input:

 1          3
10000       3
20000       3
30000       1
40000       2
50000       2
60000       2
70000       3
80000       1
90000       1

The output
 1-  2999 3
3000-3999 1
7000-7999 3
8000-9999 1
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125

1 Answers1

1

Here's a method using the chaning functions of dplyr. Here's the sample data

dd <- structure(list(pos = c(1L, 10000L, 20000L, 30000L, 40000L, 50000L, 
  60000L, 70000L, 80000L, 90000L), level = c(3L, 3L, 3L, 1L, 2L, 
  2L, 2L, 3L, 1L, 1L)), .Names = c("pos", "level"), class = "data.frame",
  row.names = c(NA, -10L))
dd <- dd[order(dd$pos), ]  #make sure its sorted by position

If the difference to your next pos is always 10000, you can do

library(dplyr)
dd %>% arrange(pos) %>% 
    mutate(run=cumsum(c(0,diff(level))!=0)) %>% 
    subset(level!=2) %>% 
    group_by(run) %>%
    summarise(level=max(level), start=min(pos), end=max(pos)+9999) %>%
    select(-run)

#   level start   end
# 1     3     1 29999
# 2     1 30000 39999
# 3     3 70000 79999
# 4     1 80000 99999

otherwise

dd %>% arrange(pos) %>% 
    mutate(run=cumsum(c(0,diff(level))!=0), nextpos=lead(pos)) %>% 
    subset(level!=2) %>% 
    group_by(run) %>%
    summarise(level=max(level), start=min(pos), end=max(nextpos)-1) %>%
    select(-run)

#   level start   end
# 1     3     1 29999
# 2     1 30000 39999
# 3     3 70000 79999
# 4     1 80000    NA

can calculate the distance to the next group for all but the last group.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • OK that first answer is pretty much what I wanted. For fear of being shouted at can I add a twister? What if there is another column called home. Each home has start and end positions so that I'm hoping to get the range start and end positions specific to home. Ive put the examples in the edited code above – Sebastian Zeki Dec 16 '14 at 08:58
  • @user3632206 `group_by(home)` at the top (right after `dd %>% `) doesn't do? – Khashaa Dec 16 '14 at 09:06