2

I am trying to streamline the process of auditing chemistry laboratory data. When we encounter data where an analyte is not detected I need to change the recorded result to a value equal to 1/2 of the level of detection (LOD) for the analytical method. I have LOD's contained within another dataframe to be used as a lookup table.

I have multiple columns representing data from different analytical tests, each with it's own unique LOD. Here's an example of the type of data I am working with:

library(tidyverse)
dat <- tibble("Lab_ID" = as.character(seq(1,10,1)),
                 "Tributary" = c('sawmill','paint', 'herring', 'water',
                               'paint', 'sawmill', 'bolt', 'water',
                               'herring', 'sawmill'),
                 "date" = rep(as.POSIXct("2021-10-01 12:00:00"), 10),
                 "TP" = c(1.5,15.7,-2.3,7.6,0.1,45.6,12.2,-0.1,22.2,0.6),
                 "TN" = c(100.3,56.2,-10.5,0.4,-0.3,11.0,45.8,256.0,12.2,144.0),
                 "DOC" = c(56.0,120.3,-10.5,0.2,14.6,489.3,0.3,14.4,54.6,88.8))
dat


detect_level <- tibble("Parameter" = c('TP', 'TN', 'DOC'),
                          'LOD' = c(0.6, 11, 0.3)) %>% 
  mutate(halfLOD=LOD/2)
detect_level

I have poured over multiple other questions with a similar theme:

Change values in multiple columns of a dataframe using a lookup table

R - Match values from multiple columns in a data.frame to a lookup table.

Replace values in multiple columns using different thresholds

and gotten to a point where I have pivoted the data and split it out into a list of dataframes that are specific analytes:

dat %>% 
    pivot_longer(cols = c('TP','TN','DOC')) %>% 
    arrange(name) %>% 
    split(.$name) 

I have tried to apply a function using map(), however I cannot figure out how to integrate the values from the lookup table (detect_level) into my code. If someone could help me continue this pipe, or finish the process to achieve a final product dat2 that should look like this I would appreciate it:

dat2 <- tibble("Lab_ID" = as.character(seq(1,10,1)),
              "Tributary" = c('sawmill','paint', 'herring', 'water',
                              'paint', 'sawmill', 'bolt', 'water',
                              'herring', 'sawmill'),
              "date" = rep(as.POSIXct("2021-10-01 12:00:00"), 10),
              "TP" = c(1.5,15.7,0.3,7.6,0.3,45.6,12.2,0.3,22.2,0.6),
              "TN" = c(100.3,56.2,5.5,5.5,5.5,11.0,45.8,256.0,12.2,144.0),
              "DOC" = c(56.0,120.3,0.15,0.15,14.6,489.3,0.3,14.4,54.6,88.8))

dat2

Another possibility would be from the closest similar question I have found is:

Lookup multiple column from a single table

Here's a snippet of code that I have adapted from this question, however, if you run it you will see that where values exist that are not found in detect_level an NA is returned. Additionally, it does not appear to have worked for $TN or $DOC, even in cases when the $LOD value from detect_level was present.

dat %>% 
    mutate(across(all_of(unique(detect_level$Parameter)),
                  ~ {i1 <- detect_level$Parameter == cur_column()
                  detect_level$LOD[i1][match(., detect_level$LOD)]}))

I am not comfortable at all with the purrr language here and have only adapted this code from the question linked, so I would appreciate if this is the direction an answerer chooses, that they might comment code to explain briefly what is happening "under the hood".

Thank you in advance!

dandrews
  • 967
  • 5
  • 18

1 Answers1

2

Perhaps this helps

library(dplyr)
dat %>%
   mutate(across(all_of(detect_level$Parameter), 
     ~ pmax(., detect_level$LOD[match(cur_column(), detect_level$Parameter)])))

For the updated case

dat %>%
   mutate(across(all_of(detect_level$Parameter), 
     ~ replace(., . < detect_level$LOD[match(cur_column(), 
    detect_level$Parameter)],detect_level$halfLOD[match(cur_column(), 
    detect_level$Parameter)])))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Wondering if you could take this a step further. I have edited `detect_level` to include values for 1/2 of the LOD, which is what I need to change the values to for those data in `dat` that are less than the LOD value. Can you suggest a way to replace values of `dat` below the `$LOD` value with `$halfLOD`? – dandrews Oct 22 '21 at 18:30
  • 1
    Try the update in the solution – akrun Oct 23 '21 at 04:30