1

I have a data frame in R magic query with one column with device id and the other with time. The time is in the format of 2013-05-02 09:20:12

The dataframe is in ascending order. I am trying to remove all rows where the time column, based on days only, is greater than X days from the row above it.

Is there a way to apply this?

Brandon Minnick
  • 13,342
  • 15
  • 65
  • 123
  • 1
    Please post the relevant code you have written so far so we can help. – Souvik Ghosh Sep 05 '17 at 04:43
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269) . This will make it much easier for others to help you. – Jaap Sep 05 '17 at 06:01
  • Do you want it to be recursive ? – moodymudskipper Sep 05 '17 at 08:55

4 Answers4

1

First create a reproducible example:

dates = seq(as.Date('2016-01-01'), as.Date('2016-07-01'), by = "days")
df <- data.frame(dateCol = sample(dates, 100))

Then you can create a computational column that takes the previous date and adds 4 (or X days) to it:

df$plusXdays = NA
df$plusXdays[2:nrow(df)] = df$dateCol[1:(nrow(df)-1)]
df$plusXdays = df$plusXdays + 4

Then you can filter on your column:

df = df[df$dateCol > df$plusXdays, ]

And drop the computational column:

df$plusXdays = NULL

Hope it helps.

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
Emmanuel-Lin
  • 1,848
  • 1
  • 16
  • 31
1
# create data.frame
d <- data.frame(device_id=1:15,x.time= as.POSIXct(x = c( '01.01.2017 16:50:00', '03.01.2017 13:09:25', 
             '05.01.2017 0:23:55', '06.01.2017 12:17:55', '08.01.2017 5:10:16',
             '09.01.2017 18:00:07', '11.01.2017 17:46:31', '13.01.2017 1:05:36', 
             '14.01.2017 20:00:39', '15.01.2017 22:42:36', '17.01.2017 3:05:08', 
             '18.01.2017 12:38:02', '20.01.2017 1:02:43', '21.01.2017 2:56:34', 
             '22.01.2017 14:55:48' ),format = "%d.%m.%Y %H:%M:%S"))

print(d)
X <- 1 # set days difference
library(dplyr)
d %>% tbl_df %>% mutate(x.dates=as.Date(x.time),prev.diff=x.dates-lag(x.dates)) %>% print %>% # calculate and view days between... 
filter(prev.diff<=X) # select dates which have days between previos date <= X day
Edvardoss
  • 393
  • 3
  • 8
1

Try this:

subset(df,c(0,diff(as.Date(time))) <=X)
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
0

You could try this:

Assume your table looks like this:

device_id <- c("A1", "A2", "A3", "A4")
date <- c("2001-01-01", "2001-01-01", "2001-01-02", "2001-01-03")
dat <- data.frame(device_id, date)
dat$date <- as.Date(dat$date)

You can create an additional column with the dates from the row above:

dat["date.previous"] <- as.Date(c(NA, head(dat["date"], dim(dat)[1] - 1)[[1]]))

Then you create an auxiliary column to identify the difference between these two:

dat$date.diff <- dat$date - dat$date.previous

And lastly, you exclude the ones that are greater than X days (in my case 1)

library("dplyr")
dat <- subset(dat, !date.diff >= 1)
deca
  • 730
  • 1
  • 8
  • 24