0

I've written a function to find and save the first derivative to a png file. It takes a data.frame input and a name input and applies this to filter() and creates the PNG name using paste0 then calculates and saves the first derivative.

I need to apply this A LOT of times, Is there a way to apply it quickly to a vector of names? So far I've been using a hacky way, involving Excel to quickly paste down multiple columns and then merge them together, but I feel like there should be a nice way in R. I've made the data an external input for this example but as its all running off one data frame, I have just been including the data input within the actual function so that the only external input is the name... if that makes sense.

The function:

first_deriv <- function(data, site_name) {
  require(pspline)
  require(dplyr)
  png_name <- paste0(site_name, ".png")
  data <- data %>%
    filter(site == site_name) %>%
    select(age, depth)
  age <- data %>% pull(age)
  depth <- data %>%  pull(depth)
  predict <- predict(sm.spline(x = depth, y = age), depth, 1)
  png(filename = png_name,
      width = 600,
      height = 350)
  plot(predict, main = site_name, xlab = "depth")
  dev.off()
} 

Data example:

df <- structure(list(site = c("4NT", "4NT", "4NT", "4NT", "4NT", "10T", 
"10T", "10T", "10T", "10T", "5T", "5T", "5T", "5T", "5T"), age = c(-62.1, 
-59.7, -57.3, -54.9, -52.5, -62.4, -61.4, -60.4, -59.4, -58.4, 
-62.3, -61.2, -60.1, -59, -57.9), depth = c(1, 2, 3, 4, 5, 1, 
2, 3, 4, 5, 1, 2, 3, 4, 5)), row.names = c(NA, -15L), class = "data.frame")

names <- c("10t", "4NT", "5T")

How it currently runs:

first_deriv(df, "10T")
first_deriv(df, "4NT")
first_deriv(df, "5T")

Cheers, Paul.

Paul Tansley
  • 171
  • 7
  • Does this answer your question? [R ~ Vectorization of a user defined function](https://stackoverflow.com/questions/50766836/r-vectorization-of-a-user-defined-function) – ekoam Nov 23 '20 at 10:51

2 Answers2

1

So you want to do this for every name in site? If so, a simple for-loop should work:

all_sites <- unique(df$site) #get all site names
for(s in all_sites) first_deriv(df, s) #apply function to each site name
benimwolfspelz
  • 679
  • 5
  • 17
  • I'm pretty new to R and I've not used for loops before. Could you just explain what the s is doing? Thanks. – Paul Tansley Nov 23 '20 at 11:36
  • `s` is a variable that has a different value at each cycle of the loop, assuming each value in the vector `all_sites`. Using `unique`, I made `all_sites` a vector containing every site name there is, but only once each. You can read this code litterally as "For every site name, make `s` the site name and then do `first_deriv(df, s)`." – benimwolfspelz Nov 23 '20 at 13:10
1

If you want to stays in base R, then you can use by. I provide an example below where I altered your first_deriv function such that it can run without errors on the data set you provided (replacing sm.spline with lm):

# like your function
require(dplyr)
first_deriv_org <- function(data, site_name) {
  data <- data %>%
    filter(site == site_name) %>%
    select(age, depth)
  predict(lm(age ~ depth, data))
} 

# the new function to use
first_deriv_new <- function(dat) 
  predict(lm(age ~ depth, dat))

# they give the same
by(df, df$site, first_deriv_new)
#R> df$site: 10T
#R>     6     7     8     9    10 
#R> -62.4 -61.4 -60.4 -59.4 -58.4 
#R> ------------------------------------------------------------------------------------------------- 
#R> df$site: 4NT
#R>     1     2     3     4     5 
#R> -62.1 -59.7 -57.3 -54.9 -52.5 
#R> ------------------------------------------------------------------------------------------------- 
#R> df$site: 5T
#R>    11    12    13    14    15 
#R> -62.3 -61.2 -60.1 -59.0 -57.9 

first_deriv_org(df, "10T")
#R>     1     2     3     4     5 
#R> -62.4 -61.4 -60.4 -59.4 -58.4 
first_deriv_org(df, "4NT")
#R>     1     2     3     4     5 
#R> -62.1 -59.7 -57.3 -54.9 -52.5 
first_deriv_org(df, "5T")
#R>     1     2     3     4     5 
#R> -62.3 -61.2 -60.1 -59.0 -57.9 

This should be much faster for larger data sets if the function you need to evaluate, first_deriv, is not too slow. You can use the following if you do not want to see the output:

invisible(by(df, df$site, first_deriv_new))

Combining all of this and simplifying your function can give you:

first_deriv <- function(dat) {
  require(pspline)
  png_name <- paste0(dat$site[[1L]], ".png")
  predict <- with(dat, predict(sm.spline(x = depth, y = age), depth, 1))
  png(filename = png_name, width = 600, height = 350)
  plot(predict, main = site_name, xlab = "depth")
  dev.off()
} 
invisible(by(df, df$site, first_deriv))

Data

df <- structure(list(
  site = c("4NT", "4NT", "4NT", "4NT", "4NT", "10T", 
           "10T", "10T", "10T", "10T", "5T", "5T", "5T", "5T", "5T"), 
  age = c(-62.1, -59.7, -57.3, -54.9, -52.5, -62.4, -61.4, -60.4, -59.4, 
          -58.4, -62.3, -61.2, -60.1, -59, -57.9), 
  depth = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5)), 
  row.names = c(NA, -15L), class = "data.frame")
  • Hi Benjamin, thanks for your answer. I need to get in the habit of actually testing my reprex actually works... We decided to use spline rather than lm as some of the changes we're looking for could be very small and we didn't want lm to smooth anything out. Thanks for your suggestion though. – Paul Tansley Nov 23 '20 at 11:34
  • Happy to help. Please, find the updated version where I suggest a solution in a couple of lines using `by` with the `sm.spline` function. – Benjamin Christoffersen Nov 23 '20 at 11:48
  • Thats great, I'll give it a go – Paul Tansley Nov 23 '20 at 12:12