R for loop wise : Rowwise sum on conditions : Performance issue

Question

I have a database, where I am running code to change value of a cell-based on the sum of previous cells and the sum of succeeding cells in the same row.

for (i in 1:row1) 
  
{      
  for(j in 3:col-1) 
  {       # for-loop over columns
    
      if (as.numeric(rowSums(e[i,2:j])) == 0 ) 
      {
        
        e1[i,j] <- 0
        
      }
      else if (as.numeric(rowSums(e[i,2:j])) > 0 &&  e[i,j] == 0 && as.numeric(rowSums(e[i,j:col])) > 0 ) 
      {
        e1[i,j] <- 1
      }
      
      else if (as.numeric(rowSums(e[i,2:j])) > 0 &&  e[i,j] == 1 && as.numeric(rowSums(e[i,j:col])) > 0 )    
      {
        e1[i,j] <- 0
        
      }        
      
    }
    
}

The runtime is very high. Appreciate any suggestions to improve the speed. Additional info: copying new values into the data frame is being done.

Thanks, Sandy

edit 2:

Sample data:

structure(list(`Sr no` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 16, 17, 18, 19), `2018-01` = c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `2018-02` = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `2018-03` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `2018-04` = c(0, 
0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `2018-05` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0), `2018-06` = c(0, 
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0), `2018-07` = c(0, 
0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0), `2018-08` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1), `2018-09` = c(0, 
0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0), `2018-10` = c(1, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1), `2018-11` = c(0, 
1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1), `2018-12` = c(1, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), `2019-01` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0), `2019-02` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-19L), class = c("tbl_df", "tbl", "data.frame"))

for loops and nested ifs are usually not the fastest option. But since you do not provide any sample data of `e`, `e1`, `col` and `row1` to work with, i doubt that anyone can provide a solid answer.. — Wimpel, Jul 13 '21 at 13:13
thank you for the revert. I have added an image of the sample databse and additional info based on your suggestions — Sandy, Jul 13 '21 at 13:44
the data provided is not real sample data... It is a picture of data... I cannot read this into R. Please create a minimal sample set (does not have to be production data), and post it using `dput(e)`, or if too lagre, `dput(head(e))`. This output can be copy-pasted into your question (within a code-block), and after that directly into R. Do the same for your desired output `e1`. more info on how to create a minimal sample set: https://stackoverflow.com/a/5963610/6356278 — Wimpel, Jul 13 '21 at 14:13
@Ronak Shah 1 represents "On the service" for customers. 0 represents "off the service". I want to highlight the pauses between any two subsequent "on the services". If they are no "1" in succeeding month, the customer is gone away — Sandy, Jul 14 '21 at 06:43

score 0 · Accepted Answer · answered Jul 13 '21 at 16:27

0

I think you can do this with matrix logic. Depends if you have enough RAM.

# creating fake data
# nc <- 300 # number of columns
nc <- 10 # for testing
nn <- 1e6 # rows
e <- sapply(1:nc, function(x) sample.int(2, nn, replace = T) - 1L) 
e <- as.data.frame(e)

row1 <- nrow(e)
colc <- ncol(e)
# note that:
3:colc-1
# isnt equal with:
3:(colc-1)

s <- 3:(colc-1) # I assume you meant this
e1 <- matrix(nrow = row1, ncol = length(s)) # empty resulting matrix
s1 <- sapply(s, function(j) rowSums(e[, 2:j])) # sum for each relevant i,j
s2 <- sapply(s, function(j) rowSums(e[, j:colc])) # sum for each relevant i,j

e2 <- as.matrix(e[, s]) # taking relevant columns of e

e1[s1 == 0] <- 0
e1[s1 > 0 &  e2 == 0 & s2 > 0] <- 1
e1[s1 > 0 &  e2 == 1 & s2 > 0] <- 0

answered Jul 13 '21 at 16:27

minem

3,640
2
15
29

Thanks, It reduced the time taken by half. – Sandy Jul 14 '21 at 08:52
@Sandy you could also look into `rowSums2` from `matrixStats` package. It could give some improvements. – minem Jul 16 '21 at 09:23
thanks for the suggestion. I will look into it. I will raise a new query if faced with any issues. Thanks again. – Sandy Jul 16 '21 at 12:16

R for loop wise : Rowwise sum on conditions : Performance issue

1 Answers1