0

Hi I have a dataframe for example:

Order Number    Date
4378345         2020-01-02
4324375         2020-02-03

Now I want to subset this Dataframe only on orders with Date greater than last 2 month from today's Date, So that when I put it on automation the code it automatically take's last two month to till date.

Any help would be appreciated.

EDIT: My apologies i think i Should have been more clear as by last two month i mean if the date is today 2020-03-16, I wound want my data to be from 2020-02-01 to till date

r_a
  • 107
  • 6

5 Answers5

4
library(lubridate)
subset(your_data, Date > today() - months(2))

This assumes your date column is of class Date already.

In general, months are not super well-defined... you may want to use a more deterministic criterion. For example, what is 2 months before April 28, April 29, April 30, May 1? Keep in mind that February had 29 days this year. You can see lubridate's opinion with (as.Date("2020-04-28") + 0:3) - months(2), which is NA in the case of 2020-04-30. Doing 60 days before (Date > today() - days(60)) or some other better-defined criteria will give you more consistency.


To go from the first day of the previous month, use the following code. This is well-defined, as all months have a first day.

subset(your_data, Date >= floor_date(today(), unit = "month") - months(1))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Hi thank you for your answer. actually when i run your code it is giving me data from 2020-02-16. My apologies i think i Should have been more clear as by last two month i mean if the date is today 2020-03-16, I wound want my data to be from 2020-02-01 to till date – r_a Mar 16 '20 at 13:08
  • 1
    For that, use `Date >= floor_date(today(), unit = "month") - months(1)`. That will be well-defined because all months have a first day. – Gregor Thomas Mar 16 '20 at 13:10
  • Thank you so much .. by running this code tab2 <- subset(dump_raw, dump_raw$date >= floor_date(today(), unit = "month") - months(0)).... I'm able to get the data for only march .. Now If i want to get the data of only feb ,, what changes should i apply ? – r_a Mar 16 '20 at 13:34
  • 1
    For only the current month `subset(dump_raw, date >= floor_date(today(), unit = "month"))` (you don't need the `- months(0)`), for only previous month `subset(dump_raw, date >= floor_date(today(), unit = "month") - months(1), date < floor_date(today(), unit = "month")))` – Gregor Thomas Mar 16 '20 at 13:43
  • Thank you so much for you help mate :) – r_a Mar 16 '20 at 15:27
3

You can use the package lubridate with something like:

library(lubridate)

subset(yourDF, 
       Date >= (today() - months(2)))

[edit: Ouch, someone was faster)

  • Hi thank you for your answer. actually when i run your code it is giving me data from 2020-02-16. My apologies i think i Should have been more clear as by last two month i mean if the date is today 2020-03-16, I wound want my data to be from 2020-02-01 to till date – r_a Mar 16 '20 at 13:08
2
library(dplyr)
library(lubridate)

x <- tibble(Date = as_date(c("2020-01-02", "2020-02-03")))

x %>% filter(Date >= today() - months(2))
knytt
  • 583
  • 5
  • 15
2
time <- read.table(textConnection("
OrderNumber    Date
4378345 2020-01-02
4324375 2020-02-03"), header = TRUE)

time <- as.data.frame(time)

library(lubridate)
library(dplyr)

time2 <- time %>% 
            filter(Date >= today() - months(2))
  • Thank you so much .. by running this code tab2 <- subset(dump_raw, dump_raw$date >= floor_date(today(), unit = "month") - months(0)).... I'm able to get the data for only march .. Now If i want to get the data of only feb ,, what changes should i apply ? – r_a Mar 16 '20 at 13:38
1

In base R you can find the first of last month using seq (credits to @G.Grothendieck), and replacing the day with the first using strftime.

last.1st <- 
  as.Date(paste0(strftime(seq(today, length=2, by="-1 month")[2], format="%Y-%m"), "-01"))

The first of current month can be obtained using gsub and regular expressions.

curr.1 <- as.Date(gsub("\\d{2}$", "01", Sys.Date()))

Then subsetting as usual is straightforward. The whole period:

dat[dat$date >= last.1, ]
#    order.num       date
# 18    432174 2020-02-01
# 19    432175 2020-02-03
# 20    432176 2020-02-05
# 21    432177 2020-02-07
# 22    432178 2020-02-09
# 23    432179 2020-02-11
# 24    432180 2020-02-13
# 25    432181 2020-02-15
# 26    432182 2020-02-17
# 27    432183 2020-02-19
# 28    432184 2020-02-21
# 29    432185 2020-02-23
# 30    432186 2020-02-25
# 31    432187 2020-02-27
# 32    432188 2020-02-29
# 33    432189 2020-03-02
# 34    432190 2020-03-04
# 35    432191 2020-03-06
# 36    432192 2020-03-08
# 37    432193 2020-03-10
# 38    432194 2020-03-12
# 39    432195 2020-03-14
# 40    432196 2020-03-16

And just the last month:

dat[dat$date >= last.1 & dat$date <= curr.1, ]
#    order.num       date
# 18    432174 2020-02-01
# 19    432175 2020-02-03
# 20    432176 2020-02-05
# 21    432177 2020-02-07
# 22    432178 2020-02-09
# 23    432179 2020-02-11
# 24    432180 2020-02-13
# 25    432181 2020-02-15
# 26    432182 2020-02-17
# 27    432183 2020-02-19
# 28    432184 2020-02-21
# 29    432185 2020-02-23
# 30    432186 2020-02-25
# 31    432187 2020-02-27
# 32    432188 2020-02-29

Toy data

dat <- structure(list(order.num = c(432157, 432158, 432159, 432160, 
432161, 432162, 432163, 432164, 432165, 432166, 432167, 432168, 
432169, 432170, 432171, 432172, 432173, 432174, 432175, 432176, 
432177, 432178, 432179, 432180, 432181, 432182, 432183, 432184, 
432185, 432186, 432187, 432188, 432189, 432190, 432191, 432192, 
432193, 432194, 432195, 432196), date = structure(c(18259, 18261, 
18263, 18265, 18267, 18269, 18271, 18273, 18275, 18277, 18279, 
18281, 18283, 18285, 18287, 18289, 18291, 18293, 18295, 18297, 
18299, 18301, 18303, 18305, 18307, 18309, 18311, 18313, 18315, 
18317, 18319, 18321, 18323, 18325, 18327, 18329, 18331, 18333, 
18335, 18337), class = "Date")), class = "data.frame", row.names = c(NA, 
-40L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110