0

I would like to know how to remove the rows with the latest 3 max dates. I get my data through an API so it's dynamic and can't just filter rows from my data frame with specific dates because the latest dates keep changes.

My Data looks like this:

 date       ptppAll mAverage
   <date>       <dbl>    <dbl>
 1 2020-03-01    1.71    NA   
 2 2020-03-02    7.82    NA   
 3 2020-03-03    9.81    NA   
 4 2020-03-04    1.71     4.23
 5 2020-03-05    3.42     4.72
 6 2020-03-06    0        4.68
 7 2020-03-07    5.13     6.19
 8 2020-03-08    5.13     6.53
 9 2020-03-09    7.54     6.53
10 2020-03-10   20.4      8.04

In the above data, assuming that 2020-03-10 is the max date, I'd like to remove it alongside rows containing dates 2020-03-09, 2020-03-08. So from my example, the code would like this:

   date       ptppAll mAverage
   <date>       <dbl>    <dbl>
 1 2020-03-01    1.71    NA   
 2 2020-03-02    7.82    NA   
 3 2020-03-03    9.81    NA   
 4 2020-03-04    1.71     4.23
 5 2020-03-05    3.42     4.72
 6 2020-03-06    0        4.68
 7 2020-03-07    5.13     6.19
aqkhan
  • 66
  • 8

3 Answers3

1

try:

library(dplyr)

data <- structure(list(date = c("2020-03-01", "2020-03-02", "2020-03-03", 
                        "2020-03-04", "2020-03-05", "2020-03-06", "2020-03-07", "2020-03-08", 
                        "2020-03-09", "2020-03-10"), 
                       ptppAll = c(1.71, 7.82, 9.81, 1.71, 3.42, 0, 5.13, 5.13, 7.54, 20.4),
                       mAverage = c(NA, NA, NA, 4.23, 4.72, 4.68, 6.19, 6.53, 6.53, 8.04)), class = "data.frame",
                       row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

data %>% arrange(date) %>% head(-3)
#>         date ptppAll mAverage
#> 1 2020-03-01    1.71       NA
#> 2 2020-03-02    7.82       NA
#> 3 2020-03-03    9.81       NA
#> 4 2020-03-04    1.71     4.23
#> 5 2020-03-05    3.42     4.72
#> 6 2020-03-06    0.00     4.68
#> 7 2020-03-07    5.13     6.19

Created on 2020-08-20 by the reprex package (v0.3.0)

Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Thank you! but how do you deal with it when you have months worth of data. One would not be able to put that into the structure like you have done. – aqkhan Aug 20 '20 at 09:45
  • The structure is only there to make your example reproducible, and to demonstrate that the solution works. Of course you don't need it in real life ;). I got it using [dput](https://stackoverflow.com/a/49995752/13513328), but it would have been easier if you did it directly in your question. – Waldi Aug 20 '20 at 09:48
  • If you have month of data, just take the head of it as an example : `dput(head(data,8))` – Waldi Aug 20 '20 at 09:55
0

You simply can use filter:

library(tidyverse)

data <- structure(list(date = c("2020-03-01", "2020-03-02", "2020-03-03", 
                            "2020-03-04", "2020-03-05", "2020-03-06", "2020-    03-07", "2020-03-08", 
                            "2020-03-09", "2020-03-10"), 
                   ptppAll = c(1.71, 7.82, 9.81, 1.71, 3.42, 0, 5.13, 5.13, 7.54, 20.4),
                   mAverage = c(NA, NA, NA, 4.23, 4.72, 4.68, 6.19, 6.53, 6.53, 8.04)), class = "data.frame",
              row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))


data %>%
  filter(date %in% head(sort(unique(.$date)), -3))

# date ptppAll mAverage
# 1 2020-03-01    1.71       NA
# 2 2020-03-02    7.82       NA
# 3 2020-03-03    9.81       NA
# 4 2020-03-04    1.71     4.23
# 5 2020-03-05    3.42     4.72
# 6 2020-03-06    0.00     4.68
# 7 2020-03-07    5.13     6.19
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
0

Using base R

head(data[order(data$date),], -3)

data

data <- structure(list(date = c("2020-03-01", "2020-03-02", "2020-03-03", 
                            "2020-03-04", "2020-03-05", "2020-03-06", "2020-    03-07", "2020-03-08", 
                            "2020-03-09", "2020-03-10"), 
                   ptppAll = c(1.71, 7.82, 9.81, 1.71, 3.42, 0, 5.13, 5.13, 7.54, 20.4),
                   mAverage = c(NA, NA, NA, 4.23, 4.72, 4.68, 6.19, 6.53, 6.53, 8.04)), class = "data.frame",
              row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
akrun
  • 874,273
  • 37
  • 540
  • 662