-1

My problem is easy to explain :

  • I have one table with start dates and end dates and n rows ordered by "start date" (see image bellow - Yellow rows are the ones I want to have on one unique row with first start date and last end date)

Table with rows where dates follow

  • I would like to regroup dates on one row when start date n+1 == end date n. Here is an exemple of what I need as a reslut (image below)

Result i need

I tried to use for loops that compare the two vectors of dates (vectors extracted from the columns) but it does not really work...

I tried something like this to identify start date and end date :

'''

a = sort(data$Date_debut)
b = sort(data$Date_fin)

for(i in 1:(length(a)-1)){
   for(j in 2:length(a)){
      datedeb = a[j-1]
      if(b[i]+1 == a[j]){
       while(b[i]+1 == a[j] ){
          datefin = b[i+1]
          i = i+1}
      
    }
 }
}

''' datedeb = start date datefin = end date

Thank you for your help, I am open to ideas / ways to deal with this.

evafeel
  • 3
  • 3
  • Welcome to Stack Overflow. Please don’t use images of data as they cannot be used without a lot of unnecessary effort. [For multiple reasons](//meta.stackoverflow.com/q/285551). You’re more likely to get a positive response if your question is reproducible. [See Stack Overflow question guidance](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Peter Jan 19 '22 at 08:56

1 Answers1

0

Here is one approach using tidyverse. For each Var1 group, create subgroups containing an index based on when the start date does not equal the previous row end date (keeping those rows together with the same index). Then you can group_by both the Var1 and the index together, and use the first start date and last end date as your date ranges.

library(tidyverse)

df %>%
  group_by(Var1) %>%
  mutate(i = cumsum(Start_date != lag(End_date, default = as.Date(-Inf)) + 1)) %>%
  group_by(i, .add = T) %>%
  summarise(Start_date = first(Start_date), End_date = last(End_date)) %>%
  select(-i)

Output

  Var1  Start_date End_date  
  <chr> <date>     <date>    
1 A     2019-01-02 2019-04-09
2 A     2019-10-11 2019-10-11
3 B     2019-12-03 2019-12-20
4 C     2019-12-29 2019-12-31

Data

df <- structure(list(Var1 = c("A", "A", "A", "A", "B", "C"), Start_date = structure(c(17898, 
17962, 17993, 18180, 18233, 18259), class = "Date"), End_date = structure(c(17961, 
17992, 17995, 18180, 18250, 18261), class = "Date")), class = "data.frame", row.names = c(NA, 
-6L))
Ben
  • 28,684
  • 5
  • 23
  • 45