0

I would like to perform the following calculation on many columns at the same time while they are grouped by ID:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04))

The data set entails a column for Assets.YYYY.MM and Returns.YYYY.MM for each month from 2018.04 to 2022.02 and I would like to create a Flows column for each of those.

I know that I could do it like this for every column:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04)) %>%
  mutate("Flows.2018.05"= Assets.2018.05 - 
           (Assets.2018.04 * Returns.2018.05))

But as I want to do this calculation for 50+ columns I was hoping there is a more elegant way. To my knowledge it should be possible with the dplyr across() function but I was not able to figure out how to do this.

I would like the new columns to be named Flows.YYYY.MM which complicates the issue further. I thought that the easiest way to achieve this might be to simply rename the columns after creating them.

I have also thought about converting the data frame from wide format to long format to perform this calculation, however this seemed even more complicated to me.

Any suggestions on achieving the desired outcome?

Please find below the sample data, as requested:

library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))

df
    ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55           5000           2345           3459            1.03            0.94
2 6F55           3000           1926           6933            0.77            1.11
3 ANE3           5870           8563           1533            1.01            0.89
4 ANE3           4098           9373           4556            0.97            1.02
5 6F55           9878           7432           9855            1.06            1.02

The desired outcome is:

  ID    Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55            5000           2345           3459            1.03            0.94        -2805          1255
2 6F55            3000           1926           6933            0.77            1.11         -384          4795
3 ANE3            5870           8563           1533            1.01            0.89         2634         -6088
4 ANE3            4098           9373           4556            0.97            1.02         5398         -5004
5 6F55            9878           7432           9855            1.06            1.02        -3039          2274
Kosc0001
  • 25
  • 5
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Most likely you'll want to reshape your data to make this easier. – MrFlick Apr 11 '22 at 00:56
  • 1
    I think it'll be a little more than just `mutate(across(.., ~ . - .*.))`, since you are referencing multiple _different_ columns in each calculation. It's not impossible to do, but it's not as direct as just `across(..)`. FYI, the way it appears you have named your columns might suggest storing your data in a _long_ format vice the _wide_ format it is in now. While not a hard-and-fast rule, the appearance of `.YEAR.MONTH` in the column *name* often indicates this. It often takes a shifting of how you think about data, and it is not always necessary, but ... thoughts. Sample data, please. – r2evans Apr 11 '22 at 01:04
  • As suggested I have added sample data to reproduce the issue. – Kosc0001 Apr 11 '22 at 01:21

1 Answers1

3

How about this:

  library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))


df %>% 
  pivot_longer(-ID, 
               names_to = c(".value", "date"), 
               names_pattern= "(.*)\\.(\\d{4}\\.\\d{2})") %>% 
  arrange(ID, date) %>% 
  group_by(ID, date) %>% 
  mutate(obs = seq_along(date)) %>% 
  group_by(ID, obs) %>% 
  mutate(Flow = Assets - (lag(Assets)*Returns)) %>% 
  pivot_wider(names_from = "date", 
              values_from = c("Assets", "Returns", "Flow")) %>% 
  as.data.frame()
#>     ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55   1           5000           2345           3459              NA
#> 2 6F55   2           3000           1926           6933              NA
#> 3 6F55   3           9878           7432           9855              NA
#> 4 ANE3   1           5870           8563           1533              NA
#> 5 ANE3   2           4098           9373           4556              NA
#>   Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1            1.03            0.94           NA     -2805.00      1254.70
#> 2            0.77            1.11           NA      -384.00      4795.14
#> 3            1.06            1.02           NA     -3038.68      2274.36
#> 4            1.01            0.89           NA      2634.30     -6088.07
#> 5            0.97            1.02           NA      5397.94     -5004.46

Created on 2022-04-10 by the reprex package (v2.0.1)

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25