How to change data frame by getting some specific rows repeated?

Question

    a      b     c             d
   5015  3.49 1059.500       0.00
   5023  2.50 6056.000       2.50
   5024  3.00 1954.500       3.00
   5026  3.49 1163.833       0.00
   5037  2.50 6797.000       2.50
   5038  3.00 2109.000       3.00
   5040  2.50 4521.000       2.50
   5041  3.33 2469.000       3.33

I want to repeat previously observed row with column 0 before a row non zero value of d. so, I will get rows with alternate rows of row with zero value of d then non zero value of d. a row with zero value of d must be previously observed row.

Output I want is:

   a     b    c              d      

  5015  3.49 1059.500       0.00    
  5023  2.50 6056.000       2.50    
  5015  3.49 1059.500       0.00    
  5024  3.00 1954.500       3.00    
  5026  3.49 1163.833       0.00    
  5037  2.50 6797.000       2.50    
  5026  3.49 1163.833       0.00    
  5038  3.00 2109.000       3.00    
  5026  3.49 1163.833       0.00    
  5040  2.50 4521.000       2.50    
  5026  3.49 1163.833       0.00    
  5041  3.33 2469.000       3.33

dont repost your questions http://stackoverflow.com/questions/36019820/how-to-repeat-rows-with-0-price-from-dataframe-before-non-zero-price-row — rawr, Mar 16 '16 at 17:24

Pierre L · Accepted Answer · 2016-03-16T17:33:58.603

We can create a custom function f that will interleave the first row. Split on cumsum(d == 0) creating an index for values equaling 0. Finally we combine with do.call(rbind, ...). I added an optional 'row.names<-'(..., NULL) call to undo the default naming convention:

f <- function(x) x[c(rbind(rep(1,nrow(x)-1), 2:nrow(x))),]
`row.names<-`(do.call(rbind, lapply(split(df1, cumsum(df1$d == 0)), f)), NULL)
#       a    b        c    d
# 1  5015 3.49 1059.500 0.00
# 2  5023 2.50 6056.000 2.50
# 3  5015 3.49 1059.500 0.00
# 4  5024 3.00 1954.500 3.00
# 5  5026 3.49 1163.833 0.00
# 6  5037 2.50 6797.000 2.50
# 7  5026 3.49 1163.833 0.00
# 8  5038 3.00 2109.000 3.00
# 9  5026 3.49 1163.833 0.00
# 10 5040 2.50 4521.000 2.50
# 11 5026 3.49 1163.833 0.00
# 12 5041 3.33 2469.000 3.33

There is an interleave trick in there. Try c(rbind(c(1,1,1), c(2,3,4))) to see the way the numbers will be weaved together

Here are some other interleaving tricks, for folks' reference: http://stackoverflow.com/q/16443260/1191259 — Frank, Mar 16 '16 at 17:01

Roland · Answer 2 · 2016-03-16T17:16:31.823

Package data.table's grouping by is useful here:

library(data.table)
DF <-fread("    a      b     c             d
   5015  3.49 1059.500       0.00
                 5023  2.50 6056.000       2.50
                 5024  3.00 1954.500       3.00
                 5026  3.49 1163.833       0.00
                 5037  2.50 6797.000       2.50
                 5038  3.00 2109.000       3.00
                 5040  2.50 4521.000       2.50
                 5041  3.33 2469.000       3.33")

DF[ #find indices:
  DF[, {ind <- .I[rep(1L, (.N - 1) * 2)] #first repeat the first index
      ind[c(FALSE, TRUE)] <- .I[-1] #then replace every second repeat with the other indices
      ind
      }, by = cumsum(abs(d) < .Machine$double.eps^0.5)][["V1"]] #group by the different d = 0 rows, 
                                                                 #beware of floating point errors if you have calculated d
  ] #subset with the indices

#        a    b        c    d
#  1: 5015 3.49 1059.500 0.00
#  2: 5023 2.50 6056.000 2.50
#  3: 5015 3.49 1059.500 0.00
#  4: 5024 3.00 1954.500 3.00
#  5: 5026 3.49 1163.833 0.00
#  6: 5037 2.50 6797.000 2.50
#  7: 5026 3.49 1163.833 0.00
#  8: 5038 3.00 2109.000 3.00
#  9: 5026 3.49 1163.833 0.00
# 10: 5040 2.50 4521.000 2.50
# 11: 5026 3.49 1163.833 0.00
# 12: 5041 3.33 2469.000 3.33

How to change data frame by getting some specific rows repeated?

2 Answers2