1

I found that my question might be similar to this one, but I could figure out how to adjust it to my case.

I have one data set containing dates of an image and another one with the dates when it rained. I would like to remove images what were taken within 3 days after it rained.

E.g. For example:

df1 <- data.frame(c(1,2,3,4), as.Date(c("1934-05-20", "1934-05-03", "1934-05-04", "1934-05-01")))
names(df1) <- c('img', 'date')

df2 <- data.frame(c(3,8,64,5,7), as.Date(c("1934-05-27", "1934-05-25", "1934-05-15", "1934-05-04", "1934-05-02")))
names(df2) <- c('rain', 'date')

Giving us:

> df1
  img       date
1   1 1934-05-20
2   2 1934-05-04
3   3 1934-05-03
4   4 1934-05-01

> df2
   rain       date
1     3 1934-05-27
2     8 1934-05-25
3    64 1934-05-15
4     5 1934-05-04
5     7 1934-05-02

The output would look like:

img         date
  1   1934-05-20
  4   1934-05-01

UPD:

I have used dummy method, but it worked for me:

i <- 0
mylist <- c(0,0)
for (x in df1$Date){
  i <- i+1
  x <- as.Date(x, format="%Y-%m-%d", origin = "1970-01-01")
  yr <- format(as.Date(x, format="%Y-%m-%d", origin = "1970-01-01"),"%Y")
  r <- subset(df2, YY == yr)
  y <- x - r$Date
  s <- sum(y >= 0 & y <= 3)
  if (s == 0) {
    mylist[i] <- FALSE 
  } else {mylist[i] <- TRUE
  } 

}
dat <- df1[!mylist, ]
Valerija
  • 25
  • 1
  • 8
  • Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Perhaps the following tips on [asking a good question](http://stackoverflow.com/help/how-to-ask) may also be worth a read. – lmo May 23 '17 at 18:29
  • Not sure without additional info, but [this](https://stackoverflow.com/questions/41602763/how-to-combine-r-dataframes-based-constraints-on-a-time-column/41602979#41602979) might help. – Nick Criswell May 23 '17 at 18:35

1 Answers1

3

First create an index with sapply:

idx <- sapply(df1$date, function(x) {y <- x - df2$date; sum(y >= 0 & y < 3) == 0})

You can use the index to subset df1:

df1[idx,]

which gives:

> df1[idx,]
  img       date
1   1 1934-05-20
4   4 1934-05-01
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • Dear Jaap, would you have any idea why your code would not work with the larger data sets? I have checked class of all dates and they are in date format. But I am getting `system.index Date NA NA.1 NA.2 NA.3 NA.4 NA.5 ` – Valerija May 23 '17 at 20:59
  • @Valerija Does `df1[which(idx),]` give you a better result? – Jaap May 23 '17 at 21:05
  • no unfortunately not. I get `[1] system.index Date <0 rows> (or 0-length row.names)` – Valerija May 23 '17 at 21:10
  • If it helps here are files that contain [image data](https://www.dropbox.com/s/r80skszu16jibje/img5.csv?dl=0) and [rain events](https://www.dropbox.com/s/i2sohr5ihj9u5g3/rain.csv?dl=0) @Jaap – Valerija May 23 '17 at 22:42