1

Here is some Example Data:

Begin = c("10-10-2010 12:15:35", "10-10-2010 12:20:52", "10-10-2010 12:23:45", "10-10-2010 12:25:01", "10-10-2010 12:30:29")

End = c("10-10-2010 12:24:23", "10-10-2010 12:23:30", "10-10-2010 12:45:15", "10-10-2010 12:32:11", "10-10-2010 12:45:05")

df = data.frame(Begin, End)

I want to count the number of events that have not currently finished when a new event begins and record it in a new column. So for this particular example the end result that is desired would be a column with values: 0, 1, 1, 1, 2

I need this to be coded in R please. I found a way to calculate this in SAS with a lag function but I do not like that method for various reasons and would like something that works better in R.

In reality I have 36,000 rows and this is dealing with power outages.

Someone asked me to post what I have tried, well. In SAS I was successful with a lag function as I said. That method did not work well because you have to hardcode a ton and its not efficient.

In R I tried to sort by begintime and number from 1-36k then sort by endtime and number from 1-36k and then try some ifthen logic but hit a wall and do not think that will work either.

My question was told to be edited to be made available to the community again. The only reason I can imagine is because there are too many possible answers. Well, I didn't edit anything, but I added this excerpt. In programming there will be many answers for any 'good' question that is not exactly the most simple question (but even those have many answers, especially in R). This question is one I know many people will ask throughout time and frankly it is hard to find a source of information on how to do this in R online. The answer to this question was very short and it worked perfectly. It would be a shame to not make this question available to the community as the point of stackoverflow is to attain a repertoire of great questions so basically their name will be pulled up when people google things along the lines of that question.

TravisLong
  • 69
  • 5
  • 4
    Please post what you have tried so far and some specific questions that we can help with. Otherwise this looks like a 'please write my code for me' question which will not get a good response. – Ali Beadle Dec 30 '16 at 12:36
  • 3
    Do not post your data as an image, please learn how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – Jaap Dec 30 '16 at 12:36
  • I said what I had Tried and was successful with Ali, Thank you. I'm sorry you think this looks like a please write my code for me question and I hope other choose not to respond to it maliciously. I think that would just say something about this community considering I believe its clearly a harder datetime manipulation than many others. I hope others don't think you should have to justify yourself before asking a question. – TravisLong Dec 30 '16 at 12:45

1 Answers1

0

Maybe this helps:

library(lubridate)
library(data.table)
df <- as.data.frame(lapply(df, dmy_hms))
dt <- as.data.table(df)
setkey(dt,Begin,End)[,id:=.I]
merge(dt, foverlaps(dt,dt)[id>i.id,.N,by="Begin,End"], all.x=T)[,id:=NULL][is.na(N),N:=0][]
# Begin                 End N
# 1: 2010-10-10 12:15:35 2010-10-10 12:24:23 0
# 2: 2010-10-10 12:20:52 2010-10-10 12:23:30 1
# 3: 2010-10-10 12:23:45 2010-10-10 12:45:15 1
# 4: 2010-10-10 12:25:01 2010-10-10 12:32:11 1
# 5: 2010-10-10 12:30:29 2010-10-10 12:45:05 2
lukeA
  • 53,097
  • 5
  • 97
  • 100