7

I have a vector of dates e.g.

dates <- c('2013-01-01', '2013-04-02', '2013-06-10', '2013-09-30')

And a dataframe which contains a date column e.g.

df <- data.frame(
                'date' = c('2013-01-04', '2013-01-22', '2013-10-01', '2013-10-10'),
                'a'    = c(1,2,3,4),
                'b'    = c('a', 'b', 'c', 'd')
                )

And I would would like to subset the dataframe so it only contains rows where the date is less than 5 days after any of the dates in the 'dates' vector.

i.e. The initial dataframe looks like this

date       a b 
2013-01-04 1 a
2013-01-22 2 b
2013-10-01 3 c
2013-10-10 4 d

After the query I would only be left with the first and third row (since 2013-01-04 is within 5 days of 2013-01-01 and 2013-10-01 is within 5 days of 2013-09-30)

Does anyone know of the best way to do this?

Thanks in advance

joran
  • 169,992
  • 32
  • 429
  • 468
user1165199
  • 6,351
  • 13
  • 44
  • 60

3 Answers3

5

This is easy (and very fast) to do with a data.table roll:

library(data.table)
dt = data.table(df)

# convert to Date (or IDate) to have numbers instead of strings for dates
# also set the key for dates for the join
dt[, date := as.Date(date)]
dates = data.table(date = as.Date(dates), key = 'date')

# join with a roll of 5 days, throwing out dates that don't match
dates[dt, roll = 5, nomatch = 0]
#         date a b
#1: 2013-01-04 1 a
#2: 2013-10-01 3 c
eddi
  • 49,088
  • 6
  • 104
  • 155
  • Thanks, I have begun to use data tables more and know how good they can be so this looks like it may be the best and quic kest way of doing it however copying your code exactly (and my set up above) returns an empty data table for me... Any idea what I may be doing wrong? – user1165199 Oct 07 '13 at 16:32
  • @user1165199 you're getting an empty `data.table` with example in OP? that's very strange - maybe restart your session in case you overwrote smth? – eddi Oct 07 '13 at 16:37
  • It is still empty, however when I increase roll to be something massive (10000000000) it returns all rows. Could it be that my dates are stored as datetime or something and it is adding on 5 seconds instead of 5 days? – user1165199 Oct 07 '13 at 16:47
  • @user1165199 `as.Date` should convert anything to a date, it can't have time, I suggest `dput`'ing a tiny part of your data (or a more representative sample since it sounds like OP isn't). You can also try `as.IDate` to see if that makes a difference. – eddi Oct 07 '13 at 16:53
  • I have installed the latest version of data.table and it works now, not sure what the issue was before... Thanks for your help! – user1165199 Oct 07 '13 at 17:00
4

broken down into steps:

# Rows Selected: Iterate over each row in the DF, 
#   and check if its `date` value is within 5 from any value in the `dates` vector
rows <- sapply(df$date, function(x) any( abs(x-dates) <=  5))

# Use that result to subset your data.frame
df[rows, ]

#         date a b
# 1 2013-01-04 1 a
# 3 2013-10-01 3 c

Importantly, make sure your date values are actual Dates and not characters looking like dates

dates <- as.Date(dates)
df$date <- as.Date(df$date)
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
0

First make sure that df$date is of class date. Then:

df[df$date %in% sapply(dates, function(x) x:(x+5)),]

        date a b
1 2013-01-04 1 a
3 2013-10-01 3 c

For some reason I feel like this may be a more proper method:

 df[df$date %in% mapply(`:`, from=dates, to=dates+5),]
Señor O
  • 17,049
  • 2
  • 45
  • 47
  • 1
    just use `abs` ;) A lot less searching and iterating – Ricardo Saporta Oct 07 '13 at 15:58
  • 1
    @RicardoSaporta I think the OP asked for only 5 days after – Michele Oct 07 '13 at 16:18
  • To be fair, OP has two different wordings of his desired range that make it unclear. In any case, @RicardoSaporta's method of directly subtracting `dates` from `df$dates` should do better in most cases - although I'd be curious to see if/when this method would be better. – Señor O Oct 07 '13 at 16:25
  • 1
    Personally, I prefer @eddi's method, but of course I would :) I do however, think it is important for a user to understand what is going and how to make it happen – Ricardo Saporta Oct 07 '13 at 16:36