-1

I am currently working on a problem which involves data cleaning and calculation in below fashion : I have created the sample dataset here for a single unit A. Data is sorted according to timestamp column for each unit. There are other columns as well. For each distinct alternate value of event_log_value_desc, I need to get rows. In the case of multiple duplicate values of event_log_value_desc, it should return the row with the first occurrence of event_log_value_desc. event_log_value_desc should have alternate values of OFF and ON for each unit.

In return, the program should return the following :

enter image description here

Ric S
  • 9,073
  • 3
  • 25
  • 51
User1101
  • 53
  • 5
  • Can you please give some sample data using the function `dput` and pasting the output in your question? In this way people will be facilitated in helping you. Thanks – Ric S Apr 01 '20 at 07:18
  • 1
    Please take a look at [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), to modify your question, with a smaller sample taken from your data (check `?dput()`). Posting images of your data or no data makes it difficult to impossible for us to help you! – massisenergy Apr 01 '20 at 07:32

1 Answers1

0

I don't know if this solution works since it has not been tested on your dataset, but I believe it should be fine

library(dplyr)

df %>% 
  group_by(unit) %>% 
  mutate(event_log_value_desc_lag = lag(event_log_value_desc)) %>% 
  filter(event_log_value_desc != event_log_value_desc_lag | is.na(event_log_value_desc_lag))
Ric S
  • 9,073
  • 3
  • 25
  • 51
  • Thank you Ric. It worked and tested it on few units. – User1101 Apr 01 '20 at 09:33
  • 1
    @User1101 Nice to hear that! If you want to clarify what the code does I will edit my answer with a bit of explanation. Also, if you found my answer useful, please consider upvoting it :) – Ric S Apr 01 '20 at 09:44