0

I have a dataset where the precision of measurements varied between years, however the precision used in each year was not recorded. Therefore, I want to infer the precision based on the values of the measurements. For example, in a given year, if all the measurements end in 0, then I can infer the measurements were to the closest 10. Similarly, if all the measurements end in 0's or 5's, then I can infer the measurements were to the closest 5.

Below is my example dataset and the function I am using to infer the precision of the values for each site in each year. The str_sub function is throwing an error, that I don't know how to resolve, and this error persists regardless of whether I include the first line (or which option) to create precisionType.

library(dplyr)
library(stringr)    

# example dataset with 3 years and 2 sites
# - first year measurements were to the closest 10
# - second year measurements were to the closest 5
# - third year measurements were to the closest 1
test <- data.frame(
  year = rep(c(rep(1995, 10), rep(1996, 10), rep(1997, 10)), 2),
  site = c(rep(LETTERS[1], 30), rep(LETTERS[2], 30)),
  mass = rep(c(rep(c(10,20,30,40,50), 2), rep(c(10, 25, 30, 35, 50), 2), rep(c(18, 25, 32, 44, 57), 2)), 2))

# function for inferring precision
fun_precision <- function(df, measurement, unit) {
  # set type of precision (e.g., LENGTH, MASS): 2 options and I have tried both
  precisionType <- paste(quo_name(enquo(measurement)), "PRECISION", sep = "_") # option 1
  # precisionType <- paste(deparse(substitute(measurement)), "PRECISION", sep = "_") # option 2
  
  # determine precision (i.e., by1, by5 or by10 units)
  precision <- df %>%
    # determine last digit in measurement
    mutate(last = as.numeric(str_sub(measurement, -1, -1))) %>%
    # group by year, river and last digit
    group_by(year, site, last) %>%
    # count number of times the last digit occurs
    summarise(n = n()) %>%
    # arrange dataframe from smallest to largest last digits
    arrange(last) %>%
    # switch from long to wide format
    pivot_wider(id_cols      = c(year, site),
                names_from   = last,
                names_prefix = "n", #appends n before last digit (e.g., n1, n2, etc.)
                values_from  = n) %>%
    # determine precision of measurement
    mutate(
      not0    = sum(c_across(c(n1:n9)), na.rm = TRUE),
      not0or5 = sum(c_across(c(n1:n4, n6:n9)), na.rm = TRUE),
      "{precisionType}" = case_when(nNA > 0 & not0 == 0 ~ NA_character_,
                                    not0 == 0           ~ paste("By10", unit, sep = ""),
                                    not0or5 > 0         ~ paste("By1", unit, sep = ""),
                                    !is.na(n0) & !is.na(n5) ~ paste("By5", unit, sep = ""))) %>%
    select(year, site, precisionType)
  
  # join to initial dataset
  df <- left_join(df, precision)
}

test <- fun_precision(test, mass, "g")
Error in `mutate()`:
! Problem while computing `last = as.numeric(str_sub(measurement, -1, -1))`.
Caused by error in `stri_sub()`:
! object 'mass' not found
Run `rlang::last_error()` to see where the error occurred.
tnt
  • 1,149
  • 14
  • 24
  • 1
    Since you are passing in an unquoted column name, you should use the embrace syntax to inject the column name into the expression `mutate(last = as.numeric(str_sub({{measurement}}, -1, -1)))` You have errors after that on `c(YEAR, river)` neither of which matches exactly a column name in your sample data. – MrFlick Dec 13 '22 at 20:41
  • @MrFlick thanks! that helps with the str_sub issue! updated question to replace those errors. – tnt Dec 13 '22 at 21:17

0 Answers0