How can I use a function to analyse all the rows in all the tibbles, having my data in a list of tibbles?

Question

I have a list of 106 tibbles, each one contains two columns (date, temperature) with thousands of values.

I tried to create a function that allows me to get the index of the row, in which the temperature is lower than 8.0 four times by tibble.

The problem I am having is that my code, is performing only the first row of every single tibble.

Here you can see the code:

pos_r = 0;
temp =0; 
posx = vector();
for (i in seq_along(data_sensor)){
  if (temp < 4){
    pos_r = pos_r + 1;
  if (data_sensor[[i]]$Temperature < 8.0){
       temp=temp+1;
} else if (temp == 4){
   posx[i] = pos_r;
   i = i+1;
}
}
}



> [1] NA NA NA NA NA NA  5  6 NA  7  8 NA NA  9 NA NA NA 10 11 NA 12 13 14 NA 15 16 17 18 19 NA
 [31] 20 21 22 NA 23 24 25 26 27 NA 28 NA 29 30 NA 31 32 33 34 NA 35 36 37 38 NA 39 40 41 42 43
 [61] 44 NA 45 NA 46 47 48 49 50 51 52 53 54 55 56 57 58 NA NA NA 59 60 61 NA 62 63 NA 64 65 66
 [91] NA 67 NA NA 68 69 70 71 72 73 74 75 76 77 78 79

How can I treat all the rows of every single tibble of the list?

eipi10 · Accepted Answer · 2021-08-28T16:09:52.330

Here's one option: In the code below we use logical tests to find the index of the row for which temperature has been below 8 on four days. Then we use map to implement this method on each data frame in the list.

library(tidyverse)

# Generate a list of 5 data frames to work with
set.seed(33)
dl = replicate(5, tibble(date=seq(as.Date("2021-01-01"), as.Date("2021-02-01"), by="1 day"),
                         temperature = 10 + cumsum(rnorm(length(date), 0, 3))),
               simplify=FALSE)

# Index of row of fourth day with temperataure lower than 8
# Run this on the first data frame in the list
min(which(cumsum(dl[[1]][["temperature"]] < 8) == 4))
#> [1] 8

# Run the method on each data frame in the list
# Note that infinity is returned if no data row meets the condition
idx8 = dl %>% 
  map_dbl(~ min(which(cumsum(.x[["temperature"]] < 8) == 4)))

idx8
#> [1]   8  29 Inf   7   6

Here are the individual steps illustrated on the first data frame in the list:

# Logical vector returning TRUE when temperature is less than 8
dl[[1]][["temperature"]] < 8
#>  [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
#> [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

# Cumulative number of days where temperature was less than 8
cumsum(dl[[1]][["temperature"]] < 8) 
#>  [1] 0 0 0 0 1 2 3 4 4 5 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

# Index of rows for which the cumulative number of days where 
#  temperature was less than 8 is equal to 4
which(cumsum(dl[[1]][["temperature"]] < 8) == 4)
#> [1] 8 9

# We want the index of the first row that meets the condition
min(which(cumsum(dl[[1]][["temperature"]] < 8) == 4))
#> [1] 8

Get the indicated row from each data frame, or missing values if there's no row that satisfied the condition. Return the result as a data frame:

list(dl, idx8) %>% 
  pmap_dfr(~ { 
    if(is.infinite(.y)) {
      tibble(date=NA, temperature=NA)
    } else {
      .x %>% 
        slice(.y) %>% 
        mutate(row.index=.y) %>% 
        relocate(row.index)
    }
  },
  .id="data.frame")
#> # A tibble: 5 × 4
#>   data.frame row.index date       temperature
#>   <chr>          <dbl> <date>           <dbl>
#> 1 1                  8 2021-01-08       7.12 
#> 2 2                 29 2021-01-29      -0.731
#> 3 3                 NA NA              NA    
#> 4 4                  7 2021-01-07       6.29 
#> 5 5                  6 2021-01-06       4.58

In the case that I want to get back the index when the condition happens for third time in a row, what can I do? I tried this code but just give me index one when true which is not useful. `idx8 = data_sensor %>% map_dbl(~min(which(cumsum(.x[["Temperature"]] < 8.0) == 3))) idx84 = data_sensor %>% map_dbl(~min(which(cumsum(.x[["Temperature"]] < 8.0) == 4))) idx83 = idx84-idx8` — M_1, Sep 08 '21 at 12:58

M_1 · Answer 2 · 2021-09-11T04:20:22.537

The answer of eipi10 answered my question. Later on I needed to find the first time that the temperature is lower than 8.0 for three times in a row using exactly the same data.

The next code is a possible solution for this case:

idx84 = data_sensor %>% 
  map_dbl(~min(which(cumsum(.x[["Temperature"]] < 8.0) == 4)))
idx87 = data_sensor %>% 
  map_dbl(~min(which(cumsum(.x[["Temperature"]] < 8.0) == 7)))
idx8=idx87-idx84

In this example we select the index range from 7 down to 4.

Along the idx8 we use n to count the number of times the multiple condition is True.

The mask is used in the case that we want to analyse another range but it has been already found in a previous range, so we will keep the index we have found priorizing the first timer.

i.e: Between idx87 and idx84 seventy values has been found, the mask will point this values with 0. If we want to get the values between idx88 and idx 85 we will not change the values pointed by the mask with a 0.

Finally, we save the value of the index for idx87 knowing that the multiple condition is True in idx_pos8[i]

for (i in seq_along(idx8)){
      #If the difference is 3, there are no na's 
      if ((idx8[i] == 3)&(!is.na(idx8[i]))){
        n=n+1 
        mask_idx8[i]=0
        idx_pos8[i]=idx87[i]
        #If there are na's or differ from 3
      } else if ((is.na(idx8[i]))||(idx8[i] != 3)){
          idx_pos8[i]=0
          mask_idx8[i]= idx8[i]
    }}

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-ask). — Community, Sep 11 '21 at 03:50

How can I use a function to analyse all the rows in all the tibbles, having my data in a list of tibbles?

2 Answers2