r - Filtering with dataframe leads partially to NAs

Question

I am measuring electric current (µA) over a certain time interval (s) for 4 different channels (chan_n) and this is how my data looks:

dat 

s       µA            chan_n
<dbl>   <dbl>        <chr>
0.00    -0.03167860   1     
0.02    -0.03136610   1     
0.04    -0.03118490   1     
0.06    -0.03094740   1     
0.08    -0.03065360   1     
0.10    -0.03047860   1     
0.12    -0.03012230   1     
0.14    -0.02995980   1     
0.16    -0.02961610   1
...     ...           ...

My end goal is to get the current of a certain time after the peak value. Therefore I first get the time timepoints at which the maximum appears for each channel:

BaslineTime <- dat %>% 
  group_by(chan_n) %>%      
  slice(which.max(µA)) %>%  # get max current values
  transmute(s =  s + 30)    # add 30 to the timepoints at which the max value appears


chan_n s
<chr>  <dbl>
1      539.84           
2      540.00           
3      539.82           
4      539.80

But if I use BaselineTime to filter for my current values I get two NAs:

BaslineVal <- right_join(dat, BaselineTime, by =c("chan_n","s")) 


s       µA          chan_n
<dbl>   <dbl>       <chr>
540.00  0.00364974  2       
539.80  0.00610948  4       
539.84  NA          1       
539.82  NA          3

I checked if the time values exist for channel 1 and 3 and they do. Also if I create a data frame manualy by hardcoding the time values and use it for filtering, it works just fine. So why isn't it working? I would be very happy for any suggestions or explanations. I think it might have something to do the the decimal places as for channel 2 and 4 there is a 0 on the last decimal place.

Welcome to SO! Unfortunately your question doesn't provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your issue so it will be hard for anyone to help much. It sounds like you've already checked the obvious issue. Can you find the minimal subset of your data that still produces the problem and share that using any of the techniques mentioned in the linked question? — Dan Adams, Feb 01 '22 at 15:08

score 0 · Answer 1 · answered Feb 01 '22 at 15:13

Untested as the sample data isn't suitable for testing. I would try something like this:

data %>% 
  group_by(chan_n) %>%
  mutate(
    is_peak = row_number() == which.max(µA),
    post_peak = lag(is_peak, n = 30, default = FALSE)
  )

This will give a TRUE in the new post_peak column 30 rows after the peak, so you can trivially ... %>% filter(post_peak) or do whatever you need to with the result.

If you need more help than this, please share some data that illustrates the problem better, e.g., 10 rows each of 2 chan_n groups with the goal of finding the row 3 after the peak (and that row existing in the data).

Thank you very much! That gives me what I need in much more elegant way. — user18046926, Feb 01 '22 at 15:32

r - Filtering with dataframe leads partially to NAs

1 Answers1