Scoping and functions that only give results for first row of data

Question

Forgive me this is my first time asking a question online.

First: setting up some data for ease of asking question.

location <- c(1, 2, 3, 4)
numerator_estimate <- c(625, 180, 210, 1753)
numerator_variance <- c(22165, 2451, 11610, 172968)
denominator_estimate <- c(2278 , 4742, 1115, 26892)
denominator_variance <- c(15870, 688, 7172, 1908288)
my_df <-data.frame(location, numerator_estimate, numerator_variance, denominator_estimate, denominator_variance)

This function bootstraps the SE of a quotient given the estimate and variance of both the numerator and denominator

calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f, iterations = 10000){
  numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
  denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
  quotient_sim <- numerator_sim/denominator_sim
  quotient_sim_se <- sd(quotient_sim)
  return(quotient_sim_se)
}

This function calculates the quotient, and is included to show that the calculate_quotient_se is not working, but another function does work.

calculate_quotient <- function(numerator_estimate_f,denominator_estimate_f){
  quotient <- numerator_estimate_f/denominator_estimate_f
}

my_df2 <- my_df %>%
  mutate(quotient_se = calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000),
         quotient = calculate_quotient(numerator_estimate, denominator_estimate))
my_df2

Note how the quotient_se is only works for the the first row, and that se is copied for each additional row down.

It doesn't work this way either:

my_df$q_se <- calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000)
my_df

It will work if I type everything in like this:

(x1 <- calculate_quotient_se(625, 22165, 2278, 15870))
(x2 <- calculate_quotient_se(180, 2451, 4742, 688))
(x3 <- calculate_quotient_se(210, 11610, 1115, 7172))
(x4 <- calculate_quotient_se(1753, 172968, 26892, 1908288))

Any suggestions on how I can get the simulated SE in the dataframe for more calculations?

have a look at this: https://stackoverflow.com/questions/15059076/call-apply-like-function-on-each-row-of-dataframe-with-multiple-arguments-from-e — Bulat, Mar 06 '18 at 00:13

score 0 · Answer 1 · answered Mar 06 '18 at 00:18

0

my_df$quotient_se <- 
    apply(my_df, 1, function(x) calculate_quotient_se(x[2], x[3], x[4], x[5]))

my_df$quotient <- 
    apply(my_df, 1, function(x) calculate_quotient(x[2],x[4]))

answered Mar 06 '18 at 00:18

AidanGawronski

2,055
1
14
24

`apply` coerces to a matrix, which will cause a problem if data has different types. A safer base R approach is to use `Map`/`mapply`, though you'd have to wrap in `do.call` and `c` here: `do.call(mapply, c(calculate_quotient_se, my_df[-1]))` or subset each argument like you did above. – alistaire Mar 06 '18 at 01:35

alistaire · Answer 2 · 2018-03-06T01:31:32.260

If you have a function that isn't vectorized, you can apply it across the rows of a data set with purrr::pmap, which iterates a function in parallel across the elements of a list (in this case, a data frame). Since you want it to simplify to a numeric vector, use the pmap_dbl version:

library(tidyverse)
set.seed(47)    # make sampling reproducible

my_df <- data_frame(location = c(1, 2, 3, 4),
                    numerator_estimate = c(625, 180, 210, 1753),
                    numerator_variance = c(22165, 2451, 11610, 172968),
                    denominator_estimate = c(2278 , 4742, 1115, 26892),
                    denominator_variance = c(15870, 688, 7172, 1908288))

calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f, 
                                  iterations = 10000){
    numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
    denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
    quotient_sim <- numerator_sim/denominator_sim
    quotient_sim_se <- sd(quotient_sim)
    return(quotient_sim_se)
}

my_df <- my_df %>% mutate(quotient_se = pmap_dbl(.[-1], calculate_quotient_se))

my_df %>% select(location, quotient_se)
#> # A tibble: 4 x 2
#>   location quotient_se
#>      <dbl>       <dbl>
#> 1       1.      0.0684
#> 2       2.      0.0104
#> 3       3.      0.0993
#> 4       4.      0.0160

In this case, . represents the data piped in, and [-1] is to drop location, which shouldn't be passed into the function.

Another option is to rearrange the function so it can take vector inputs. In this case, that probably means working with matrices inside. At scale, this approach is almost always faster, though it may temporarily use more memory to store the intermediate objects.

Scoping and functions that only give results for first row of data

2 Answers2