0

I would to identify the index (position) of values in a vector with an occcurence condition.

I have a dataframe with three columns : "Image_series_names", "Image_number" and "Convergence_type" with 2280 values.

Here the description of my dataframe :

The "Image_series_names" column is a character column with a different value at each 30 lines. So there are 2280/30 = 76 different strings. The "Image_Number" column is an index with a loop from 1 to 30 number (there are 30 images for each "Image_series_names" value). the "Convergence_type" column has two values : "convergence" and "no_convergence".

My purpose is to identify for each "Image_series_names" value, the first "image_number" index that match with "convergence" value in "Convergence_Type" column only if the 4 following values are also with the same value "convergence".

I hope I describe correctly my problem as I don't know how to put only my dataframe.

Thank you for your kind support and your reading. Best regards.

I don't know what to google to find my solution. If it possible I prefer to have a tidyverse solution as it's more friendly for me to understand

fgardavaud
  • 48
  • 4
  • 4
    Sample data and expected output given that sample data would be immensely useful, "describing" data has some limited utility when looking for implementations. – r2evans Jun 28 '23 at 16:42
  • Please read this question and edit your question accordingly: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Matt Summersgill Jun 28 '23 at 18:14

2 Answers2

1

Try

library(tidyverse)
library(zoo)  # rollsum function


df |>
  mutate(
    conv5 = rollsum(Convergence_type == "convergence", k = 5, align = 'left', fill = NA) == 5,  # this should identify any row where it (plus the next 4) converge
    .by = Image_series_names
  ) |>
  summarize(
    first_conv = which(conv5)[1],  # this grabs the first case where it all works.  
    .by = Image_series_names
  )

I cannot test this without sample data, so you may need to make some adjustments.

Melissa Key
  • 4,476
  • 12
  • 21
0

Thanks a lot @Melissa Key,

It works with minor change (fill = NA was added), as :

library(tidyverse)
library(zoo)  # rollsum function


df |>
  mutate(
    conv5 = rollsum(Convergence_type == "convergence", fill = NA, k = 5, align = 'left') == 5,  # this should identify any row where it (plus the next 4) converge
    .by = Image_series_names
  ) |>
  summarize(
    first_conv = which(conv5)[1],  # this grabs the first case where it all works.  
    .by = Image_series_names
  )

Sorry to the community for not posting data online (thanks for the tuto in comments). I will do better for the next time.

fgardavaud
  • 48
  • 4