1

I am trying to find a faster way of accomplishing the following code since my actual dataset is very large. I would like to get rid of the for loop altogether. I am trying to duplicate each row in xdf into a new data frame based on the number of columns in values. Then, next to each entry in the new dataset, show the row sums from column 1 in values up to the column j.

xdf <- data_frame(
  x = c('a', 'b', 'c'),
  y = c(4, 5, 6),
)

values <- data_frame(
  col_1 = c(5, 9, 1),
  col_2 = c(4, 7, 6),
  col_3 = c(1, 5, 2),
  col_4 = c(7, 8, 5)
)

for (j in seq(ncol(values))){
  if (j==1){
    Temp <- cbind(xdf, z= rowSums(values[1:j]))
  }
  else{
    Temp <- rbind(Temp, cbind(xdf, z= rowSums(values[1:j])))
  }
}

print(Temp)

The output should be:

   x y  z
1  a 4  5
2  b 5  9
3  c 6  1
4  a 4  9
5  b 5 16
6  c 6  7
7  a 4 10
8  b 5 21
9  c 6  9
10 a 4 17
11 b 5 29
12 c 6 14

Is there a shorter way to accomplish this?

This is the closest answer that I could get on SO. How to expand data frame based on values?

I am new to R, so sorry for the longwinded code.

Ruan
  • 169
  • 2
  • 9

1 Answers1

1

Here's one base R option :

Repeat the rows in xdf as there are number of columns in values, iteratively increment one column at a time to find rowSums and add it as a new column in the final dataframe.

newdf <- xdf[rep(seq(nrow(xdf)), ncol(values)), ]
newdf$z <- c(sapply(seq(ncol(values)), function(x) rowSums(values[1:x])))
newdf

# A tibble: 12 x 3
#   x         y     z
#   <chr> <dbl> <dbl>
# 1 a         4     5
# 2 b         5     9
# 3 c         6     1
# 4 a         4     9
# 5 b         5    16
# 6 c         6     7
# 7 a         4    10
# 8 b         5    21
# 9 c         6     9
#10 a         4    17
#11 b         5    29
#12 c         6    14

A concise one-liner as suggested by @sindri_baldur doesn't require repeating the rows explicitly.

cbind(xdf, z = c(sapply(seq(ncol(values)), function(x) rowSums(values[1:x]))))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 2
    You could save the first step with and count on recycling `cbind(xdf, z = c(sapply(seq(ncol(values)), function(x) rowSums(values[1:x]))))`. – s_baldur Feb 22 '21 at 09:47