divide columns element with a specific value of another column after a specific day in R

Question

structure(list(datetime = structure(c(16512, 16513, 16514, 16517, 
16518, 16519, 16520, 16521, 16524, 16525, 16526, 16527, 16531, 
16532, 16533, 16534, 16535, 16538, 16539, 16540), class = "Date"), 
    asset = c("004125", "004125", "004125", "004125", "004125", 
    "004125", "004125", "004125", "004125", "004125", "004125", 
    "004125", "004125", "004125", "004125", "004125", "004125", 
    "004125", "004125", "004125"), investor = c("Q1762", "Q1762", 
    "Q1762", "Q1762", "Q1762", "Q1762", "Q1762", "Q1762", "Q1762", 
    "Q1762", "Q1762", "Q1762", "Q1762", "Q1762", "Q1762", "Q1762", 
    "Q1762", "Q1762", "Q1762", "Q1762"), marketprice = c(163, 
    165, 165, 165, 550, 555, 595, 540, 555, 555, 555, 534, 543, 
    165, 165, 165, 168.1, 168.1, 168.1, 168.1), corrprice = c(163, 
    165, 165, 165, 165, 555, 595, 540, 555, 555, 555, 534, 543, 
    165, 165, 165, 168.1, 168.1, 168.1, 168.1), ab_check = c(FALSE, FALSE, FALSE, FALSE, TRUE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, 
    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), adjustm = c(1, 
    1, 1, 1, 3.33, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1), returns = c(NA, 
    0.0122699386503067, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0.0187878787878788, 0, 0, 0)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    asset = "004125", .rows = structure(list(1:20), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

Hello, I have this dataframe in which I have different asset. I need R when ab_check == TRUE, to take all the following observation, hence for all the next datetime, for that asset and divide the marketprice for the adjustm value corresponding to the ab_check == TRUE row. So basically I want R to do: if ab_check == TRUE ==> marketprice(i)/adjustm(when ab_check == TRUE) where marketprice(i) must have a datetime >= datetime(ab_check == TRUE)

I'm learning dplyr and so I would prefer a dplyr solution if possible. Thank you!

Hi Lorenzo, can you please edit your post using this information: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Ben, May 31 '22 at 15:23

Carl · Answer 1 · 2022-06-01T08:59:48.497

Updated to use the marketprice where there is no adjustment.

In the example below:

Asset 50 row 3 & 4 prices are halved because row 2 is ab_check == TRUE with an adjustment of 2.
Asset 51 has no ab_check == TRUE in the group, so prices remain unadjusted
Asset 52 row 2 & 3 prices are halved because row 1 is ab_check == TRUE with an adjustment of 2.

If you need further help with this, then please post a new question with what you have so far and minimal data that shows the problem you want to address.

library(tidyverse)
library(lubridate)

df <- tribble(
  ~datetime, ~asset, ~marketprice, ~ab_check, ~adjustm,
  "2016-02-24", "6C1850", 201, NA, NA,
  "2016-02-25", "6C1850", 301,  TRUE, 2,
  "2016-02-26", "6C1850", 200, FALSE, 1,
  "2016-02-29", "6C1850", 300, FALSE, 1,
  "2016-03-01", "6C1851", 201, FALSE, 1,
  "2016-03-02", "6C1851", 301, FALSE, 1,
  "2016-03-08", "6C1851", 400, NA, NA,
  "2016-03-09", "6C1852", 101, TRUE, 2,
  "2016-03-10", "6C1852", 100, FALSE, 1,
  "2016-03-11", "6C1852", 200, FALSE, 1
) |> 
  mutate(datetime = ymd(datetime))

df |> 
  arrange(asset, datetime) |> 
  replace_na(list(ab_check = FALSE)) |> 
  group_by(asset) |> 
  mutate(rows_to_mod = max(ab_check),
         rows_to_mod = if_else(lag(ab_check) == TRUE, 1, NA_real_),
         new_adj = if_else(lag(ab_check) == TRUE, lag(adjustm), NA_real_)
         ) |> 
  fill(rows_to_mod, new_adj) |> 
  replace_na(list(new_adj = 1)) |> 
  mutate(new_col = marketprice / new_adj)
#> # A tibble: 10 × 8
#> # Groups:   asset [3]
#>    datetime   asset  marketprice ab_check adjustm rows_to_mod new_adj new_col
#>    <date>     <chr>        <dbl> <lgl>      <dbl>       <dbl>   <dbl>   <dbl>
#>  1 2016-02-24 6C1850         201 FALSE         NA          NA       1     201
#>  2 2016-02-25 6C1850         301 TRUE           2          NA       1     301
#>  3 2016-02-26 6C1850         200 FALSE          1           1       2     100
#>  4 2016-02-29 6C1850         300 FALSE          1           1       2     150
#>  5 2016-03-01 6C1851         201 FALSE          1          NA       1     201
#>  6 2016-03-02 6C1851         301 FALSE          1          NA       1     301
#>  7 2016-03-08 6C1851         400 FALSE         NA          NA       1     400
#>  8 2016-03-09 6C1852         101 TRUE           2          NA       1     101
#>  9 2016-03-10 6C1852         100 FALSE          1           1       2      50
#> 10 2016-03-11 6C1852         200 FALSE          1           1       2     100

^{Created on 2022-06-01 by the reprex package (v2.0.1)}

almost there but not enough. In particular is the lag(ab_check) which is not ok here, since I need to update all the following rows for that asset after an ab_check == TRUE, for all my asset (except 50) there is only one ab_check == TRUE hence after that, maybe 500 days, i need to use the calculation for all those 500 days. Thank you for the effort anyway! — Lorenzo Mazzucchelli, May 31 '22 at 16:17
It is updating all the following rows for that asset after the `ab_check == TRUE`. The `lag` and the `fill` on the following line together ensure that it's all the following rows for that asset. Note I did put 3 different assets in the sample data to check that. If I'm mis-reading your comment, then I think it would really help if you could show your expected output too to make it clear. — Carl, May 31 '22 at 16:44
my bad, i misread that line. ok so i think what you did is kind of correct, but when I try to apply to my dataset i get this error `Error in mutate(): ! Problem while computing calc = if_else(...). i The error occurred in group 1: asset = "004125". Caused by error in if_else(): ! true must be length 685 (length of condition) or one, not 0.` — Lorenzo Mazzucchelli, May 31 '22 at 17:10
Could you add some rows causing the problem to your example data (currently 20 observations)? The code runs okay for me with those 20. If your dataset is several hundred rows, try the code after say `dplyr::slice(1:50)` or `dplyr::slice(51:100)` etc. Maybe you can find a small sample which generates the issue. — Carl, May 31 '22 at 17:25
A possible cause might be if you have an asset where none of the rows (for that asset) has an `ab_check == TRUE` row. — Carl, May 31 '22 at 17:28
i surely have those. how to solve that? for some asset I will have zero TRUE, for those I just need to copy unaltered the price column — Lorenzo Mazzucchelli, Jun 01 '22 at 07:27
it still gives me the error `Error in mutate(): ! Problem while computing new_adj = if_else(lag(ab_check) == TRUE, lag(adjustm), NA_real_). i The error occurred in group 1: investor = "412Z8", asset = "006420". Caused by error in lag(): ! object adjustm not found` which i don't know ho to fix — Lorenzo Mazzucchelli, Jun 01 '22 at 21:19

divide columns element with a specific value of another column after a specific day in R

1 Answers1