Find the nth value of a vector with an occurence condition tidyverse R?

Question

I would to identify the index (position) of values in a vector with an occcurence condition.

I have a dataframe with three columns : "Image_series_names", "Image_number" and "Convergence_type" with 2280 values.

Here the description of my dataframe :

The "Image_series_names" column is a character column with a different value at each 30 lines. So there are 2280/30 = 76 different strings. The "Image_Number" column is an index with a loop from 1 to 30 number (there are 30 images for each "Image_series_names" value). the "Convergence_type" column has two values : "convergence" and "no_convergence".

My purpose is to identify for each "Image_series_names" value, the first "image_number" index that match with "convergence" value in "Convergence_Type" column only if the 4 following values are also with the same value "convergence".

I hope I describe correctly my problem as I don't know how to put only my dataframe.

Thank you for your kind support and your reading. Best regards.

I don't know what to google to find my solution. If it possible I prefer to have a tidyverse solution as it's more friendly for me to understand

Sample data and expected output given that sample data would be immensely useful, "describing" data has some limited utility when looking for implementations. — r2evans, Jun 28 '23 at 16:42
Please read this question and edit your question accordingly: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Matt Summersgill, Jun 28 '23 at 18:14

Melissa Key · Accepted Answer · 2023-06-30T21:09:09.860

1

Try

library(tidyverse)
library(zoo)  # rollsum function


df |>
  mutate(
    conv5 = rollsum(Convergence_type == "convergence", k = 5, align = 'left', fill = NA) == 5,  # this should identify any row where it (plus the next 4) converge
    .by = Image_series_names
  ) |>
  summarize(
    first_conv = which(conv5)[1],  # this grabs the first case where it all works.  
    .by = Image_series_names
  )

I cannot test this without sample data, so you may need to make some adjustments.

edited Jun 30 '23 at 21:09

answered Jun 28 '23 at 17:27

Melissa Key

4,476
12
21

Add `fill=NA` argument to `rollsum` call. – G. Grothendieck Jun 29 '23 at 12:41
yes your right. `fill = NA` is necessary in my case. – fgardavaud Jun 30 '23 at 17:17
I added the `fill = NA` to the solution for completeness. Thanks! – Melissa Key Jun 30 '23 at 21:09

score 0 · Answer 2 · answered Jun 30 '23 at 16:58

Thanks a lot @Melissa Key,

It works with minor change (fill = NA was added), as :

library(tidyverse)
library(zoo)  # rollsum function


df |>
  mutate(
    conv5 = rollsum(Convergence_type == "convergence", fill = NA, k = 5, align = 'left') == 5,  # this should identify any row where it (plus the next 4) converge
    .by = Image_series_names
  ) |>
  summarize(
    first_conv = which(conv5)[1],  # this grabs the first case where it all works.  
    .by = Image_series_names
  )

Sorry to the community for not posting data online (thanks for the tuto in comments). I will do better for the next time.

Find the nth value of a vector with an occurence condition tidyverse R?

2 Answers2