0

I would like to keep the first observation using the filter() function from dplyr. I mean, I would obtain many rows satisfying the same criterion from filtering but I only want to keep the first one, without further recurring to group() and distinct(). Is it possible?

I need to extract from a dataframe the first date stamp and the first date stamp where it appears "Bad".

problem = data.frame(
  Status = c("Good",  "Good",  "Bad", "Bad", "Bad"),
  Date_entry = c(as.Date("2000-01-01"), as.Date("2000-01-02"), as.Date("2000-01-03"), as.Date("2000-01-04"),as.Date("2000-01-05")),
  Date_status = c(as.Date("1999-01-01"), as.Date("1999-01-01"), as.Date("1999-01-02"), as.Date("1999-01-02"), as.Date("1999-01-02")),
  Value = c(150,20,14,96,04))

I can filter(Date == min(Date)) but then I don't know how to exactly filter out the first "Bad" outcome. I tried filter(Date_entry== min(Date_entry) | (Date_status - Date_entry) == min(Date_status - Date_entry)) but still does not work

solution = 
  data.frame(Status = c("Good", "Bad"),
             Date_entry = c(as.Date("2000-01-01"), as.Date("2000-01-02")),
             Date_status = c(as.Date("1999-01-01"), as.Date("1999-01-02")),
             Value = c(150,20))
             
Mr Frog
  • 296
  • 2
  • 16
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Maybe you just need `slice(1)`? – MrFlick Nov 23 '20 at 16:29
  • Can it be used inside filter()? – Mr Frog Nov 23 '20 at 16:31
  • It would be used after `filter()`. But it's a bit unclear exactly what you mean from your description. Again, a reproducible example would make things much clearer. – MrFlick Nov 23 '20 at 16:32
  • 2
    You can use `filter(row_number() == 1)`, but if you can relax your "using the `filter` function" requirement, this is what `slice` is made for. Or `head(1)` would also work. – Gregor Thomas Nov 23 '20 at 16:33
  • 1
    Oh, I see - reading your question again it seems like you have *some condition* and you only want the first row that meets *that condition*. A reproducible example would make this much clearer... – Gregor Thomas Nov 23 '20 at 16:37
  • Provided reproducible example – Mr Frog Nov 23 '20 at 16:46
  • Just want to pick the first "Bad" outcome out of everything else ignoring the remaining ones with filter, because the code is too messed up to recur to grouping – Mr Frog Nov 23 '20 at 16:47
  • So you want the first row of the data.frame no matter what it is as well as the first "Bad" row? Your `solution` seems odd because your Bad row has values from the second Good row. What happens if the first row is Bad? – MrFlick Nov 23 '20 at 17:26

3 Answers3

1

I think what you are asking for could be solved with

problem %>% 
  filter(Date_entry==min(Date_entry) | cumsum(Status=="Bad")==1)

Here we choose the min date, or we choose the first value of Bad using a cumsum (cumulative sum) trick. This number will go up by one each time a "Bad" is observed so we just select the row where it equals 1 (if present).

MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

Something like this?

library(dplyr)
df <- data.frame(A=c(1,1,1,1,1,2,2,2,2,2),
                 B=c(1,2,3,4,5,1,2,3,4,5))
head(df %>% filter(A==1),1)
Marcos Pérez
  • 1,260
  • 2
  • 7
0

An option with slice

library(dplyr)
problem %>%
   slice(union(which.min(Date_entry), match('Bad', Status)))

-output

#  Status Date_entry Date_status Value
#1   Good 2000-01-01  1999-01-01   150
#2    Bad 2000-01-03  1999-01-02    14
akrun
  • 874,273
  • 37
  • 540
  • 662