Remove rows with two conditions in R

Question

I have this following dataset:

df <- structure(list(Data = structure(c(1623888000, 1629158400, 1629158400
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Client = c("Client1", 
"Client1", "Client1"), Fund = c("Fund1", "Fund1", "Fund2"), Nature = c("Application", 
"Rescue", "Application"), Quantity = c(433.059697, 0, 171.546757
), Value = c(69800, -70305.67, 24875), `NAV Yesterday` = c(162.40991399996, 
162.40991399996, 145.044589000056), `NAV in Application Date` = c(161.178702344125, 
162.346370458944, 145.004198476337), `Var NAV` = c(0.00763879866215962, 
0.00039140721678275, 0.000278547270652531), `Var * Value` = c(533.188146618741, 
-27.5181466187465, 6.92886335748171), FinalValue = c(70333.1881466187, 
-70333.1881466187, 24881.9288633575), `Rentability WRONG` = c(0.0210345899274819, 
0.0210345899274819, 0.0210345899274819)), row.names = c(NA, -3L
), class = c("tbl_df", "tbl", "data.frame"))

What I need to do is:

If quantity = 0, then remove all rows with the same Fund name as that one, but remove only the rows that have Date < or = Date of the Quantity = 0 Fund

score 1 · Answer 1 · answered Aug 23 '21 at 01:33

1

You can try -

library(dplyr)

df %>%
  filter({
    #Row index where Quantity = 0
    inds = which(Quantity == 0)
    #Drop rows where Data value is less than Data value at Quantity = 0
    #and Fund is same as present at Quantity = 0.
    !(Data <= Data[inds] & Fund %in% Fund[inds])
  })

answered Aug 23 '21 at 01:33

Ronak Shah

377,200
20
156
213

Sorry, I overlooked the braces. It's fine. – r2evans Aug 23 '21 at 11:25

score 1 · Accepted Answer · answered Aug 23 '21 at 03:47

What I did here is:

I grouped the data by Fund
Arranged each group by Data
Created a column zero_point that assigns 1 to the row where Quantity == 0 and NA otherwise
Filled the fields in zero_point that come before the actual "zero point" with the same value.
filtered those rows out.

output <- df %>% 
  group_by(Fund) %>% 
  arrange(Data) %>% 
  mutate(zero_point = case_when(Quantity == 0 ~ 1)) %>%
  fill(zero_point, .direction = "up") %>% 
  filter(is.na(zero_point))

(On the condition that there is only one instance where Quantity is 0 per Fund group)

score 1 · Answer 3 · answered Aug 23 '21 at 11:37

Here's a thought:

df %>%
  group_by(Fund) %>%
  filter(!any(Quantity == 0) | Data <= Data[which.min(Quantity)])
# # A tibble: 3 x 12
# # Groups:   Fund [2]
#   Data                Client Fund  Nature Quantity   Value `NAV Yesterday` `NAV in Applica~ `Var NAV` `Var * Value` FinalValue `Rentability WR~
#   <dttm>              <chr>  <chr> <chr>     <dbl>   <dbl>           <dbl>            <dbl>     <dbl>         <dbl>      <dbl>            <dbl>
# 1 2021-06-17 00:00:00 Clien~ Fund1 Appli~     433.  69800             162.             161.  0.00764         533.       70333.           0.0210
# 2 2021-08-17 00:00:00 Clien~ Fund1 Rescue       0  -70306.            162.             162.  0.000391        -27.5     -70333.           0.0210
# 3 2021-08-17 00:00:00 Clien~ Fund2 Appli~     172.  24875             145.             145.  0.000279          6.93     24882.           0.0210

I'm assuming you meant "Data <= Data of the Quantity = 0 Fund", therefore using Data instead of Date (not found) or NAV in Application Date.
This filters nothing in this sample data, I'm hoping the logic is correct.
Testing for equality with floating-point (numeric) can be problematic at times (see Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754). If you have some small near-zero numbers, then this will silently produce counter-intuitive results without warning or error. You might be more defensive to use something like:
```
df %>%
  group_by(Fund) %>%
  filter(all(abs(Quantity) > 0) | Data <= Data[which.min(Quantity)])
```
or even
```
df %>%
  group_by(Fund) %>%
  filter(all(abs(Quantity) > 0) | 
           row_number() == which.min(Quantity) |
           Data < Data[which.min(Quantity)])
```
While the latter is a bit paranoid (and double-calculates which.min(.), it should not succumb to problems with equality tests.
The only time this will fail is if all(is.na(Quantity)); that is, which.min(c(NA,NA)) returns integer(0) which will cause an error in dplyr::filter. One might choose to add safeguard with something like filter(any(!is.na(Quantity)) & (...)).

That's very relevant but when I apply to my dataset, if I do 'group_by(Fund)' it erases the funds with Q=0 for all clients, and when I do 'group_by(Client)', it erases all clients who have rescued a fund completely (who have Q = 0). will try doing your solution inside a for loop which break clients into a list to see what I get — Vinícius Felizatti, Aug 23 '21 at 16:29
Sure, but ... are we supposed to infer based on the presence in your data that `Client` is a necessary grouping variable? Have you tried `group_by(Fund, Client)`? Your data is too sparse to be able to do much more than guess; if `Client` is relevant, than please include enough sample data so that that varies as well as `Fund`. And, unless it is applicable, we really don't need all of the other extraneous columns, please just include what is relevant to the question and your understanding of the data. — r2evans, Aug 23 '21 at 16:31
Yeah this was totally my fault, should have put a larger dataset to exemplify my problem. — Vinícius Felizatti, Aug 23 '21 at 16:33

Remove rows with two conditions in R

3 Answers3