0

In R, I am trying to create a month column to plot my data with by summing over another column that has the same value for each population I am working with, ex:

NAME ORIG_ROW MONTH
POP1 1        1
POP1 1        2
POP1 1        3
POP2 2        1
POP2 2        2
POP2 2        3

I am able to do this with:

df$MONTH <- sapply(1:nrow(df), function(i) (colSums(df[0:i, c('ORIG_ROW') == df$ORIG_ROW[i]))

However, this code is inefficient when I try to apply it to a large dataset (~825k observations).

Does anyone have suggestions on how to make this code more efficient?

K. Jean
  • 3
  • 1
  • 2
    Take a look at [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It's going to be hard for people to answer this question without a minimal example of your starting data and expected results. – divibisan Aug 06 '18 at 17:27
  • R is one-based, `0:i` becomes `1:i`. Also, do you want `MONTH` to have consecutive values, `1:n ` where `n` is the number of rows of each group of `ORIG_ROW`? – Rui Barradas Aug 06 '18 at 17:33
  • @RuiBarradas yep, I noticed that it worked the same with 0:i and 1:i, I will adjust in my code since R is one based. And yes, I would like 'MONTH' to have consecutive values as you say. – K. Jean Aug 06 '18 at 17:44
  • Right now your "able to do with" code doesn't run - you're missing a `]` and a `)`. – DanY Aug 06 '18 at 17:47

1 Answers1

1

What you want can be done with a simple call to ave, grouping a column by itself.

df$MONTH <- with(df, ave(ORIG_ROW, ORIG_ROW, FUN = seq_along))

DATA.

df <-
structure(list(NAME = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("POP1", 
"POP2"), class = "factor"), ORIG_ROW = c(1L, 1L, 1L, 2L, 2L, 
2L)), row.names = c(NA, -6L), class = "data.frame")
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66