-1

I am relatively new to R and I'm hoping that someone could answer this question for me. I have four columns in a data frame proc_id, mco_id, start_date, and end_date. PLEASE CLICK ON THIS IMAGE TO SEE THE DATAFRAME. This is the logic that I am going after. For each combination of proc_id and mco_id, if the start_date comes right after the preceding end_date, then the final dataframe for that combination of proc_id and mco_id is the minimum start date and maximum end date.

for instance, the first three rows contains 1234 and ABC for proc_id and mco_id respectively. The start date in line 3 of the data frame is one day after the end date of line 2, and the start date of line 2 is one day after end date of line 1. So, my final dataframe for proc_id and mco_id of 1234 and ABC must have a start date of '2014-01-01' and end date of '2014-07-01'. Now if the start date for a combination of proc_id and mco_id is greater than 1 day of the lagging end date, then they stay as is. Finally, if the start date comes before the lagging end date, then similar to the first instance, the minimum start date and the maximum end date for the combination of proc_id and mco_id is considered. So, here is what I would expect for the final dataframe.

This is the final data frame that I would like to get.

Any help would be greatly appreciated.

  • 1
    Please do not post pictures of data. See [how to create a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for suggestions of better way to include data in the question itself. – MrFlick Apr 13 '17 at 20:07

1 Answers1

0

Here's a quick solution with dplyr package:

df %>% 
    group_by(proc_id, mco_id) %>% 
    summarise(start=min(start_date), end = max(end_date))

Source: local data frame [4 x 4]
Groups: proc_id [?]

  proc_id mco_id      start        end
   <fctr> <fctr>     <date>     <date>
1    1234    ABC 2014-01-01 2014-07-01
2    2345    DEF 2014-06-01 2014-07-31
3    3456    ABC 2015-01-01 2015-03-30
4    4567    ABC 2015-04-15 2015-04-30

The df used:

df <- structure(list(proc_id = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 
3L, 4L), .Label = c("1234", "2345", "3456", "4567"), class = "factor"), 
mco_id = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), .Label = c("ABC", 
"DEF"), class = "factor"), start_date = structure(c(16071, 
16161, 16191, 16222, 16252, 16436, 16478, 16540), class = "Date"), 
end_date = structure(c(16160, 16190, 16252, 16251, 16282, 
16495, 16524, 16555), class = "Date")), class = "data.frame", .Names = c("proc_id", 
"mco_id", "start_date", "end_date"), row.names = c(NA, -8L))

Note: I'd spend more time trying to transcribe your image into a readable form. Please provide reproducible example in future.

Adam Quek
  • 6,973
  • 1
  • 17
  • 23