2

I am trying to use dplyr's mutate() function to create new variables that depend on the previous row values of succeeding new variables.

I've searched SO with different terms to see if something clicks, and the closest that I've come is this answer. This is a rough structure of the tibble tib I have:

library(dplyr)
library(magrittr)

tib <- tribble(
  ~ID,
  "A1",
  "A2",
  "A3",
  "A4",
  "A5",
  "A1", 
  "B1",
  "B2",
  "B3"
)

I want to use mutate() to be able to generate the columns x, y and z:

tib %<>%
  mutate(
    x = if_else(ID == "A1", 2, lag(y) + lag(z)),
    y = if_else(ID == "A1", 3, x + lag(z)),
    z = if_else(ID == "A1", 7, lag(z))
  )

For instance, for the values shown above, I would want the output to be:

|  ID  |   x   |   y   |   z   |
--------------------------------
|  A1  |   2   |   3   |   7   |
|  A2  |   10  |   17  |   7   |
|  A3  |   24  |   31  |   7   |
|  A4  |   38  |   45  |   7   |
|  A5  |   52  |   59  |   7   |
|  A1  |   2   |   3   |   7   |
|  B1  |   10  |   17  |   7   |
|  B2  |   24  |   31  |   7   |
|  B3  |   38  |   45  |   7   |
--------------------------------

The problem with this method is that mutate() returns an error:

Error in lag(y) : object 'y' not found

I understand that I'm referring to y and z before I initiate them. If x was something that depended only on itself as in the linked question above, I could've done it in multiple passes as suggested there, but I don't think that's possible here.

As suggested in the comments to my question, if I try to initialize those values (so that y and z are known), as shown below,

tib %<>%
  mutate(
    x = if_else(ID == "A1", 2, 0),
    y = if_else(ID == "A1", 3, 0),
    z = if_else(ID == "A1", 7, 0)
  )

tib %<>%
  mutate(
    x = if_else(ID == "A1", 2, lag(y) + lag(z)),
    y = if_else(ID == "A1", 3, x + lag(z)),
    z = if_else(ID == "A1", 7, lag(z))
  )

The tibble that I get is as follows:

# A tibble: 9 x 4
  ID        x     y     z
  <chr> <dbl> <dbl> <dbl>
1 A1        2     3     7
2 A2       10    17     7
3 A3        0     0     0
4 A4        0     0     0
5 A5        0     0     0
6 A1        2     3     7
7 B1       10    17     7
8 B2        0     0     0
9 B3        0     0     0

which is different from what I expect to get (perhaps because mutate() evaluates all variables column-wise so y and z are 0)?

To hopefully put it in clearer terms, I want to be able to calculate values for each of the new columns based on the values present in the previous rows of said columns - they will always have some row where they do get initial values, but how do I make it flow to the lower rows?

If it helps, this is how I would want it to work in Excel (I'm just beginning to learn R).

I would like to continue using dplyr to maintain consistency with my code.

  • I'm confused. What's in the columns x, y, and z in `tib` if you say they aren't initiated? Does `tib` not actually look like the table you have printed here? – camille May 03 '19 at 20:18
  • I can't tell what your desired output is from that `mutate()` statement because it not working. So what exactly is the rule for generating each value in your desired output? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Your table with the fake values isn't helpful because you never use those values anywhere. – MrFlick May 03 '19 at 20:18
  • 1
    So maybe you initialize the columns with those values (2, 3, 7), then replace them with the `if_else` call in `mutate`? Like define `tib` with `x = 2, y = 3, z = 7` before doing anything else – camille May 03 '19 at 20:40
  • I've edited my question thoroughly in the hope that I was able to get a reproducible example in place, @MrFlick. I've also added in the suggestion that camille suggested. – skedaddle_waznook May 03 '19 at 21:02

1 Answers1

3

mutate() doesn't really work with iterated formulas like Excel. It operates column-wise so having communication between each row iteration isn't easy. In this particular case, your function have simple non-recursive defintions. Here's a wrapper function that encapsulates the non-iterative versions

my_mutate <- function(data, x0, y0, z0) {
  mutate(data, 
    n = 1:n(),
    x = if_else(n==1, x0, y0 + z0*(n-1)),
    y = if_else(n==1, y0, y0 + z0*2*(n-1)),
    z = z0,
    n = NULL
  )
}

Then we can perform the within-group transformations with

tib %>% group_by(grp=cumsum(ID=="A1")) %>% 
  my_mutate(x0=2, y0=3, z0=7) %>% 
  ungroup %>% select(-grp)
#   ID        x     y     z
#   <chr> <dbl> <dbl> <dbl>
# 1 A1        2     3     7
# 2 A2       10    17     7
# 3 A3       17    31     7
# 4 A4       24    45     7
# 5 A5       31    59     7
# 6 A1        2     3     7
# 7 B1       10    17     7
# 8 B2       17    31     7
# 9 B3       24    45     7

It's much easier to with with non-iterative definitions when they exist.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • I'm trying to understand your multiplication of `z0*(n-1)` and `z0*2*(n-1)`. I get that it's acting of some sort of lag but can't wrap my mind around it. – dylanjm May 03 '19 at 21:29
  • Would it be best then to resort to `for` loops in these cases? Also, as mentioned above by dylanjm, how did the multiplication of `z0` by `(n-1)` and `2 * (n - 1)` work? How do I derive that relationship? – skedaddle_waznook May 03 '19 at 21:29
  • Oh, I get it now. `y = x + lag(z) = y0 + z0 + z0` for the first one and if generalized, `= y0 + z0 * (n -2)`, and `x = lag(y) + lag(z)` when generalized for the n^th `x`, `= y0 + z0 * (n - 1)`. – skedaddle_waznook May 03 '19 at 21:37
  • In general, you just need to do a bit of math/algebra to translate to a non-iterative definition (though this might not always be possible). You might consider using for loops in this case, but trying to do that with `mutate()` probably isn't going to be pleasant. This really isn't the type of transformation that `dplyr` is best able to make easier. – MrFlick May 03 '19 at 21:54