0

Thanks in advance!

I have two large datasets, both contain columns of date/time fields that are of interest. The first (the head() of which is pasted below), has a single date/time field that I am interested in – the ‘RoundDateTimeGMT’ column. This datasheet is rather large (over 500,000 rows). The data is specific to an individual noted by the PumaID column.

   PumaID   RoundDateTimeGMT
1    P01    3/3/2011 0:00
2    P01    3/3/2011 0:00
3    P01    3/3/2011 0:00
4    P01    3/3/2011 0:00
5    P01    3/3/2011 0:00
6    P01    3/3/2011 0:00

The second dataset has two date/time fields representing a start and end time (‘FstClstrTime’ and ‘LastClstrTime’ respectively) (below). All times have been converted to a recognizable R format using as.POSIXct(). As above, these data are also specific to an individual noted by the PumaID column.

   PumaID   FstClstrTime       LastClstrTime
1    P01    8/29/2011 6:01     8/29/2011 8:01
2    P01      <NA>                  <NA>
3    P01    9/10/2011 2:00     9/12/2011 12:01
4    P01    9/9/2011 8:00      9/9/2011 14:01
5    P01    9/7/2011 8:01      9/8/2011 10:00
6    P01    9/4/2011 10:01     9/6/2011 12:01

My goal is to create a new binary column within the first dataset that indicates if the RoundDateTimeGMT is between the ‘FstClstrTime’ and ‘LastClstrTime’ of the second datasheet for each individual. I only need to check if RoundDateTimeGMT is between the ‘FstClstrTime’ and ‘LastClstrTime’ if the PumaID’s of each data sheet match. I think this can be done with a for() loop, but am open to any suggestions. I just need to check every RoundDateTimeGMT (again there are over 500,000) to every FstClstrTime’ and ‘LastClstrTime for each individual.

With the large datasets dput() does not work so apologies for not attaching any data. I hope you can still offer some suggestions as how to accomplish the above goal.

Kind regards!

B. Davis
  • 3,391
  • 5
  • 42
  • 78
  • 6
    Please show some more effort when posting the data. [Last time you posted data](http://stackoverflow.com/questions/18797774/trying-to-aggregate-and-average-time-series-data-collected-every-five-minutes), you got the comment: "Also [read this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) before posting. It makes things harder if we cannot just copy-paste or load your data". Please don't expect someone else to both create a minimal, reproducible example _and_ answer your question. Why don't you just post the variables of interest? – Henrik Sep 15 '13 at 23:20
  • 1
    Adding to @Henrik: `dput` works if you subset the data. If your data is `df` then you can subset `df[1:5,]` – Metrics Sep 15 '13 at 23:48
  • 2
    And pick only the columns _relevant for your question_, e.g. `df1[sufficient-number-of-rows-not-more-not-less, c("PumaID", "RoundDateTimeGMT")]`, and similarly for the second data frame. – Henrik Sep 16 '13 at 00:09
  • Good suggestions, and apologies for the poor post. The dput() output seems way to long, given that to post a meaningful minimal there is still lots of data. I have corrected some of your suggestions. If there is assistance you can offer that would be great – although I do not expect anyone to produce a minimal, reproducible an example and answer my question as suggested in an earlier comment. Thanks – B. Davis Sep 16 '13 at 02:33

0 Answers0