Re-arrange data frame according to index column

Question

I have a data frame as shown below. I want to re-arrange the data frame based on 'Break Point' column.

The expected result should be as below

Can you explain this a bit better? Is it that every time you get a Y in "breakpoint" you want to reverse all the rows between that row and the previous breakpoint? What happens at the end? Why are there two rows with the 21,28 data — Spacedman, Apr 13 '19 at 08:20
@Spacedman - All the rows will be reversed whenever there is a breakpoint. Additionally, I want to append all the rows (if any) present below the last breakpoint as shown in the output. — Prateek, Apr 13 '19 at 08:25
You want the rows *below the last breakpoint* to be included *after a repeat of the last breakpoint*? Oh that's sad because I have a really neat solution if this irregularity wasn't present. — Spacedman, Apr 13 '19 at 09:03
Can you help me with the solution if there was not irregularity ? I can externally add all the rows present below last breakpoint. Should not be a concern for me. — Prateek, Apr 13 '19 at 09:06

Spacedman · Answer 1 · 2019-04-13T09:36:47.360

With this sample data:

df <- data.frame(
    Range1 = c(1, 2, 3, 5, 10, 12, 16, 20, 21, 28, 33),
    Range2 = c(2, 3, 5, 10, 12, 16, 20, 21, 28, 33, 40),
    Breakpoint = c("", "", "", "Y", "", "Y", "", "", "Y", "", ""))

A solution with the trailing bits cut of is:

First cut off the dangling bits:

 df2 = df[1:max(which(df$Breakpoint=="Y")),]

Then work out length of each group:

> rgroup=rle(rev(cumsum(rev(df2$Break=="Y"))))$lengths

Get where the Y are:

> Ypos = which(df2$Breakpoint=="Y")

Construct an index vector that is the Y positions minus a reversed sequence from 1 to the length of the chunk. Subset:

> df2[rep(Ypos, rgroup) - unlist(lapply(rgroup,function(x){1:x})) +1,]
  Range1 Range2 Breakpoint
4      5     10          Y
3      3      5           
2      2      3           
1      1      2           
6     12     16          Y
5     10     12           
9     21     28          Y
8     20     21           
7     16     20

Add the dangling bits back on if needed.

[edit - new version above added. Code below for historical purposes]

My old version was this, and dealt with the dangling bits:

> group=rev(cumsum(rev(df$Break=="Y")))
> rbind(do.call(rbind,lapply(split(df[group>0,],-group[group>0]),function(x){x[nrow(x):1,,drop=FALSE]}))[,c("Range1","Range2")],df[max(which(df$Break=="Y")),1:2,drop=FALSE],df[group==0,1:2])

and get:

     Range1 Range2
-3.4      5     10
-3.3      3      5
-3.2      2      3
-3.1      1      2
-2.6     12     16
-2.5     10     12
-1.9     21     28
-1.8     20     21
-1.7     16     20
9        21     28
10       28     33
11       33     40

If you don't like the row names then drop them. Uses only base R functions.

I'm not sure if this works if there's no trailing matter after the last break but you've not specified the problem well if that can happen.

Bonus Annotated version:

> group=rev(cumsum(rev(df$Break=="Y")))

This creates a vector that starts at 0 for the last row and increases each time it finds a Y. Reverse that to get a grouping variable for the chunks up to each Y.

This bit wont work if cutpaste because of the commenting I'm about to do:

> rbind(

# we need to bind three things. The reversed chunks, the last break point and   
# the trailing stuff:

      do.call(

# the trailing stuff is the rbind of the reversed chunks:

          rbind,

#           split the data into a list of chunks 

             lapply(
               split(df[group>0,],-group[group>0]),

     # reverse them

                  function(x){x[nrow(x):1,,drop=FALSE]}
     # and only take the columns we need:
        ))[,c("Range1","Range2")],
  # this is the last Y
      df[max(which(df$Break=="Y")),1:2,drop=FALSE],

  # this is the trailing rows, get them in order they appear:

      df[group==0,1:2])

Annotating it out like that has let me see some optimisations that could be made but that's all for now.

score 0 · Answer 2 · answered Apr 13 '19 at 08:16

Depending on the size of your data.frame this could be achieved quite manually by a for loop.

BreakPoints <- which(!is.na(DF$`break point`))
if(length(breakPoints) > 0){
    startIndex <- 1 #Startindex tells me where i should point the breakPoint
    for(i in breakPoints){ #Iterate over breakpoints
        #Put the break point at the startIndex row 
        DF[startIndex:i,] <- DF[c(i, startIndex:(i-1), ] 
        #Update the placement as the next block 
        startIndex <- i + 1
    }
}

if your data is large, there is likely a more efficient method. In general subsetting via [<-.dataframe is slow compared to other methods. An initial optimizer could be simply converting the above code to data.table format, where subsetting is much much faster.

Re-arrange data frame according to index column

2 Answers2

Bonus Annotated version: