0

I have this following dataset:

df <- structure(list(Data = structure(c(1623888000, 1629158400, 1629158400
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Client = c("Client1", 
"Client1", "Client1"), Fund = c("Fund1", "Fund1", "Fund2"), Nature = c("Application", 
"Rescue", "Application"), Quantity = c(433.059697, 0, 171.546757
), Value = c(69800, -70305.67, 24875), `NAV Yesterday` = c(162.40991399996, 
162.40991399996, 145.044589000056), `NAV in Application Date` = c(161.178702344125, 
162.346370458944, 145.004198476337), `Var NAV` = c(0.00763879866215962, 
0.00039140721678275, 0.000278547270652531), `Var * Value` = c(533.188146618741, 
-27.5181466187465, 6.92886335748171), FinalValue = c(70333.1881466187, 
-70333.1881466187, 24881.9288633575), `Rentability WRONG` = c(0.0210345899274819, 
0.0210345899274819, 0.0210345899274819)), row.names = c(NA, -3L
), class = c("tbl_df", "tbl", "data.frame"))

What I need to do is:

If quantity = 0, then remove all rows with the same Fund name as that one, but remove only the rows that have Date < or = Date of the Quantity = 0 Fund

3 Answers3

1

You can try -

library(dplyr)

df %>%
  filter({
    #Row index where Quantity = 0
    inds = which(Quantity == 0)
    #Drop rows where Data value is less than Data value at Quantity = 0
    #and Fund is same as present at Quantity = 0.
    !(Data <= Data[inds] & Fund %in% Fund[inds])
  }) 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

What I did here is:

  1. I grouped the data by Fund
  2. Arranged each group by Data
  3. Created a column zero_point that assigns 1 to the row where Quantity == 0 and NA otherwise
  4. Filled the fields in zero_point that come before the actual "zero point" with the same value.
  5. filtered those rows out.
output <- df %>% 
  group_by(Fund) %>% 
  arrange(Data) %>% 
  mutate(zero_point = case_when(Quantity == 0 ~ 1)) %>%
  fill(zero_point, .direction = "up") %>% 
  filter(is.na(zero_point))

Result

(On the condition that there is only one instance where Quantity is 0 per Fund group)

1

Here's a thought:

df %>%
  group_by(Fund) %>%
  filter(!any(Quantity == 0) | Data <= Data[which.min(Quantity)])
# # A tibble: 3 x 12
# # Groups:   Fund [2]
#   Data                Client Fund  Nature Quantity   Value `NAV Yesterday` `NAV in Applica~ `Var NAV` `Var * Value` FinalValue `Rentability WR~
#   <dttm>              <chr>  <chr> <chr>     <dbl>   <dbl>           <dbl>            <dbl>     <dbl>         <dbl>      <dbl>            <dbl>
# 1 2021-06-17 00:00:00 Clien~ Fund1 Appli~     433.  69800             162.             161.  0.00764         533.       70333.           0.0210
# 2 2021-08-17 00:00:00 Clien~ Fund1 Rescue       0  -70306.            162.             162.  0.000391        -27.5     -70333.           0.0210
# 3 2021-08-17 00:00:00 Clien~ Fund2 Appli~     172.  24875             145.             145.  0.000279          6.93     24882.           0.0210
  • I'm assuming you meant "Data <= Data of the Quantity = 0 Fund", therefore using Data instead of Date (not found) or NAV in Application Date.

  • This filters nothing in this sample data, I'm hoping the logic is correct.

  • Testing for equality with floating-point (numeric) can be problematic at times (see Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754). If you have some small near-zero numbers, then this will silently produce counter-intuitive results without warning or error. You might be more defensive to use something like:

    df %>%
      group_by(Fund) %>%
      filter(all(abs(Quantity) > 0) | Data <= Data[which.min(Quantity)])
    

    or even

    df %>%
      group_by(Fund) %>%
      filter(all(abs(Quantity) > 0) | 
               row_number() == which.min(Quantity) |
               Data < Data[which.min(Quantity)])
    

    While the latter is a bit paranoid (and double-calculates which.min(.), it should not succumb to problems with equality tests.

  • The only time this will fail is if all(is.na(Quantity)); that is, which.min(c(NA,NA)) returns integer(0) which will cause an error in dplyr::filter. One might choose to add safeguard with something like filter(any(!is.na(Quantity)) & (...)).

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • That's very relevant but when I apply to my dataset, if I do 'group_by(Fund)' it erases the funds with Q=0 for all clients, and when I do 'group_by(Client)', it erases all clients who have rescued a fund completely (who have Q = 0). will try doing your solution inside a for loop which break clients into a list to see what I get – Vinícius Felizatti Aug 23 '21 at 16:29
  • Sure, but ... are we supposed to infer based on the presence in your data that `Client` is a necessary grouping variable? Have you tried `group_by(Fund, Client)`? Your data is too sparse to be able to do much more than guess; if `Client` is relevant, than please include enough sample data so that that varies as well as `Fund`. And, unless it is applicable, we really don't need all of the other extraneous columns, please just include what is relevant to the question and your understanding of the data. – r2evans Aug 23 '21 at 16:31
  • Yeah this was totally my fault, should have put a larger dataset to exemplify my problem. – Vinícius Felizatti Aug 23 '21 at 16:33