Filtering the first row in R

Question

I would like to keep the first observation using the filter() function from dplyr. I mean, I would obtain many rows satisfying the same criterion from filtering but I only want to keep the first one, without further recurring to group() and distinct(). Is it possible?

I need to extract from a dataframe the first date stamp and the first date stamp where it appears "Bad".

problem = data.frame(
  Status = c("Good",  "Good",  "Bad", "Bad", "Bad"),
  Date_entry = c(as.Date("2000-01-01"), as.Date("2000-01-02"), as.Date("2000-01-03"), as.Date("2000-01-04"),as.Date("2000-01-05")),
  Date_status = c(as.Date("1999-01-01"), as.Date("1999-01-01"), as.Date("1999-01-02"), as.Date("1999-01-02"), as.Date("1999-01-02")),
  Value = c(150,20,14,96,04))

I can filter(Date == min(Date)) but then I don't know how to exactly filter out the first "Bad" outcome. I tried filter(Date_entry== min(Date_entry) | (Date_status - Date_entry) == min(Date_status - Date_entry)) but still does not work

solution = 
  data.frame(Status = c("Good", "Bad"),
             Date_entry = c(as.Date("2000-01-01"), as.Date("2000-01-02")),
             Date_status = c(as.Date("1999-01-01"), as.Date("1999-01-02")),
             Value = c(150,20))

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Maybe you just need `slice(1)`? — MrFlick, Nov 23 '20 at 16:29
It would be used after `filter()`. But it's a bit unclear exactly what you mean from your description. Again, a reproducible example would make things much clearer. — MrFlick, Nov 23 '20 at 16:32
You can use `filter(row_number() == 1)`, but if you can relax your "using the `filter` function" requirement, this is what `slice` is made for. Or `head(1)` would also work. — Gregor Thomas, Nov 23 '20 at 16:33
Oh, I see - reading your question again it seems like you have *some condition* and you only want the first row that meets *that condition*. A reproducible example would make this much clearer... — Gregor Thomas, Nov 23 '20 at 16:37
Just want to pick the first "Bad" outcome out of everything else ignoring the remaining ones with filter, because the code is too messed up to recur to grouping — Mr Frog, Nov 23 '20 at 16:47
So you want the first row of the data.frame no matter what it is as well as the first "Bad" row? Your `solution` seems odd because your Bad row has values from the second Good row. What happens if the first row is Bad? — MrFlick, Nov 23 '20 at 17:26

score 1 · Accepted Answer · answered Nov 23 '20 at 17:30

I think what you are asking for could be solved with

problem %>% 
  filter(Date_entry==min(Date_entry) | cumsum(Status=="Bad")==1)

Here we choose the min date, or we choose the first value of Bad using a cumsum (cumulative sum) trick. This number will go up by one each time a "Bad" is observed so we just select the row where it equals 1 (if present).

score 0 · Answer 2 · answered Nov 23 '20 at 16:44

0

Something like this?

library(dplyr)
df <- data.frame(A=c(1,1,1,1,1,2,2,2,2,2),
                 B=c(1,2,3,4,5,1,2,3,4,5))
head(df %>% filter(A==1),1)

answered Nov 23 '20 at 16:44

Marcos Pérez

1,260
2
7

score 0 · Answer 3 · answered Nov 23 '20 at 20:38

0

An option with slice

library(dplyr)
problem %>%
   slice(union(which.min(Date_entry), match('Bad', Status)))

-output

#  Status Date_entry Date_status Value
#1   Good 2000-01-01  1999-01-01   150
#2    Bad 2000-01-03  1999-01-02    14

answered Nov 23 '20 at 20:38

akrun

874,273
37
540
662

Filtering the first row in R

3 Answers3