1

This probably has a really simply solution. I have two data sets. One is a vector of POSIXct tweet timestamps and the second is a vector of POSIXct ADL HEAT Map timestamps.

I'm looking to build a function that lets me take the dates from the tweets vector and for each one count the number of timestamps in the ADL HEAT Map vector that fall within a specified range from the tweet.

My aim is to build the function such that I can put in the tweets vector, the ADL vector, the number of days from the tweets vector to start counting, and the number of days from the tweets vector to stop counting, and return a vector of counts the same length as the tweets data.

I already tried the solution here, and it didn't work: Count number of occurences in date range in R

Here's an example of what I'm trying to do. Here's a smaller version of the data sets I'm using:

tweets <- c("2016-12-12 14:34:00 GMT", "2016-12-5 17:20:06 GMT")
ADLData <- c("2016-12-11 16:30:00 GMT", "2016-12-7 18:00:00 GMT", "2016-12-2 09:10:00 GMT")

I want to create a function, let's call it countingfunction that lets me input the first data set, the second one, and call a number of days to look back. In this example, I chose 7 days:

countingfunction(tweets, ADLData, 7)

Ideally this would return a vector of the length of tweets or in this case 2 with counts for each of how many events in ADLData occurred within the past 7 days from the date in tweets. In this case, c(2,1).

  • Please add data using `dput` and show the expected output for the same. Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Aug 16 '20 at 13:56
  • If you read the link which I shared earlier it shares tips on how to share a reproducible example. A reproducible example is something which we can copy/paste into our session. When I copy/paste your code I receive `object 'tweets' not found` error. It would also be helpful if you show expected output using which we can verify our answers. – Ronak Shah Aug 18 '20 at 12:50
  • Edited. Sorry for the confusion. – Mike Isaacson Aug 19 '20 at 13:02
  • I added an answer, see if it helps. – Ronak Shah Aug 20 '20 at 08:27

2 Answers2

0

So, if I have understood you correctly you have that kind of data:

tweets <- c(as.POSIXct("2020-08-16", tz = ""), as.POSIXct("2020-08-15", tz = ""), as.POSIXct("2020-08-14", tz = ""), as.POSIXct("2020-08-13", tz = ""))
ADL <- c(as.POSIXct("2020-08-15", tz = ""), as.POSIXct("2020-08-14", tz = ""))

And what you want to do, is to say whether a tweet is within the ADL date range or not. That could be accomplished doing this:

ifelse(tweets %in% ADL, print("its in"), print("its not"))

You can assign this easily to another vector, which then states whether it is in or not.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Yannik Suhre
  • 724
  • 5
  • 21
0

You can write countingfunction with the help of outer and calculate the difference in time between every value of two vectors using difftime.

countingfunction <- function(x1, x2, n) {
  mat <- outer(x1, x2, difftime, units = 'days')  
  rowSums(mat > 0 & mat <= n)
}

Assuming you have vectors of class POSIXct like these :

tweets <- as.POSIXct(c("2016-12-12 14:34:00", "2016-12-5 17:20:06"), tz = 'GMT')
ADLData <- as.POSIXct(c("2016-12-11 16:30:00","2016-12-7 18:00:00", 
                        "2016-12-2 09:10:00"), tz = 'GMT')
n <- 7

You can pass them as :

countingfunction(tweets, ADLData, n)
#[1] 2 1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213