-1

a question from R newbie user: I have a data with consecutive enabled_datetime and disabled_datetime as shown below:

x<-as.data.frame(cbind(
      supplier_id=281743,
      enabled_datetime=c('2016-06-13 13:31:02','2016-06-14 07:39:19','2016-06-14 12:36:03','2016-06-16 13:44:30','2016-06-17 06:42:14'),
      disabled_datetime = c('2016-06-14 07:39:19','2016-06-14 12:36:03','2016-06-16 13:44:30','2016-06-17 06:42:14',  NA),
      discount=c(25,15,15,10,30))
)
x

supplier_id    enabled_datetime   disabled_datetime discount
      281743 2016-06-13 13:31:02 2016-06-14 07:39:19       25
      281743 2016-06-14 07:39:19 2016-06-14 12:36:03       15
      281743 2016-06-14 12:36:03 2016-06-16 13:44:30       15
      281743 2016-06-16 13:44:30 2016-06-17 06:42:14       10
      281743 2016-06-17 06:42:14                <NA>       30

What I'd like to transform into is like this:

supplier_id    enabled_datetime   disabled_datetime discount
      281743 2016-06-13 13:31:02 2016-06-14 07:39:19       25
      281743 2016-06-14 07:39:19 2016-06-16 13:44:30       15
      281743 2016-06-16 13:44:30 2016-06-17 06:42:14       10
      281743 2016-06-17 06:42:14                <NA>       30

i.e. merge rows with the same supplier_id, discount and have consecutive enabled_datetime and disabled_datetime. What I can think of is to use for loop, any one know how to do so different way? Thanks in advance.

macchiavalley
  • 517
  • 1
  • 5
  • 12
  • 3
    Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Jun 27 '16 at 12:54
  • Read about `lead()`, `lag()` and [merge](http://stackoverflow.com/questions/1299871). – zx8754 Jun 27 '16 at 12:54
  • why merging ? group by supplier_id and discount, and arrange the date, you can do this with dplyr ...correct me if i'm wrong ? PS: pls why downvoting like this ? – Mostafa90 Jun 27 '16 at 12:59
  • @DimitriPetrenko it's not 'merge' in as used in dplyr, what I mean is I want to combine those two rows into one rows. Any idea how to do so (in dplyr is ok)? – macchiavalley Jun 27 '16 at 13:20

1 Answers1

2
 df <- data.frame(supplier_id = c(281743,281743,281743,281743,281743),
                 enabled_datetime = c("2016-06-13 13:31:02","2016-06-14 07:39:19","2016-06-14 12:36:03","2016-06-16 13:44:30","2016-06-17 06:42:14"),
                 disabled_datetime = c("2016-06-14 07:39:19","2016-06-14 12:36:03","2016-06-16 13:44:30","2016-06-17 06:42:14",NA),
                 discount = c(25,15,15,10,30))

df <- df%>%
  mutate(enabled_datetime = as.POSIXct(strftime(enabled_datetime,format="%Y-%m-%d %H:%M:%S")),
         disabled_datetime = as.POSIXct(strftime(disabled_datetime,format="%Y-%m-%d %H:%M:%S")))

subdf1 <- df%>% 
  group_by(supplier_id,discount)  %>%
  mutate(enabled_datetime_lead = lead(enabled_datetime),disabled_datetime_lead = lead(disabled_datetime)) %>%
  filter(disabled_datetime==enabled_datetime_lead) %>% mutate(disabled_datetime = disabled_datetime_lead) %>% 
  select(-enabled_datetime_lead,-disabled_datetime_lead) %>% ungroup()

subdf2<- anti_join(df,resdf,by=c("supplier_id","discount"))

resdf <- full_join(subdf1,subdf2,,by=c("supplier_id","discount"))

The result of this is

supplier_id    enabled_datetime   disabled_datetime discount
        <dbl>              <time>              <time>    <dbl>
1      281743 2016-06-14 07:39:19 2016-06-16 13:44:30       15
2      281743 2016-06-13 13:31:02 2016-06-14 07:39:19       25
3      281743 2016-06-16 13:44:30 2016-06-17 06:42:14       10
4      281743 2016-06-17 06:42:14                <NA>       30

explanation for changes: changed the final statement from full_join to union as the final result got two new columns. the behaviour is different to what was originally found.

ArunK
  • 1,731
  • 16
  • 35