0

I know that this is a complex title for this post. But I haven't found my exact situation in on-line examples.

I have a named (non-anonymous) function which takes a row of a tibble, a string (structure), and a numeric (percent) as input and performs linear interpolation, iterating along a subset of the values in the row. (NOT a column-wise operation.) It performs linear interpolation. the "math" of it involves use of the values in the cells as well as numbers extracted from the names of the columns. The columns have names like GTV0, GTV1, … GTV135.

The working code for this follows. I reproduce it here for completeness, though the specifics aren't necessarily germane to the question below.

# This function works if fed one row of a df at a time, but isn't "multi-dimensional":
Dx <- function(df, structure, percent) {

  # First, make sure we've got our data in the right formats:
  df <- df %>% tibble() %>% select(starts_with(structure)) %>% rowwise() 
  structure <- toString(structure)
  percent <- as.double(percent)
  
  # If we don't have any DVH data for the structure, return "NA"
  if(is.na(df[[9]])) return(NA)
  
  for(i in 9:(length(df) - 1)) {   # The V0 is the 9th entry in the array, so start iterating there.
    
    # Deal with pesky NA's as iterating along (convert to 0's):
    if(is.na(df[[i]])) df[[i]] <- 0
    if(is.na(df[[i+1]])) df[[i+1]] <- 0
    
    # Typically unlikely for the cell's value to be a round percent, but:    
    if(df[[i]] == percent) {
      
      answer <- colnames(df[i])
      return(as.double(str_replace(answer, paste0(structure, "V"), "")))
      
    } else if(df[[i]] > percent & df[[i+1]] < percent) {  # This is why we stop at "length - 1" of data frame.
      # Do the linear interpolation here
      
      # First, capture the names of the two columns:
      column1 <- colnames(df[i])
      column2 <- colnames(df[i+1])
      
      # Strip the structure names from the column names and convert to doubles:
      column1 <- as.double(str_replace(column1, paste0(structure, "V"), ""))
      column2 <- as.double(str_replace(column2, paste0(structure, "V"), ""))
      
      # Perform the linear interpolation:
      return(as.double(column1 + ((percent - df[[i]])/(df[[i+1]] - df[[i]]) * (column2 - column1))))
    }
  }    
}

My question is: How do I purrr-ify this? Ideally, I would use this with mutate to create a new column and place the interpolated values into it, row by row. My question is two-part:

  1. How do I call the named function Dx?
  2. How do I have to modify the guts of the function (if at all) to work with purrr?

I thought it would be something like:

df <- df %>% rowwise() %>% mutate(GTVD95 = pmap_dfr(df, Dx, "GTV", 95))

But that isn't right.

I can call this existing function with a for loop:

for (i in 1:nrow(df)) {
  df$GTVD95[i] <- Dx(df[i,], "GTV", 95)
}

But that's not ideal, because I'd like to put even that into a loop, because I want to find ~20 interpolated points and don't want to call this 20 times, changing the number (eg, the two 95's in the above loop) each time.

I appreciate any insight! Thanks in advance!

  • 3
    Hi Tom, welcome to Stack Overflow! It will be much easier to answer your question if you provide a sample of your data. Please edit your question with the output of `dput(df[1:10,1:10])`. Please also provide the expected output of this example data so we can check our solution. See [How to make a great R reproducible example](https://stackoverflow.com/a/5963610/) for more. – Ian Campbell Dec 16 '20 at 20:05
  • maybe `mutate(GTVD95 = Dx(cur_data(), "GTV", 95))` will work – Abdessabour Mtk Dec 16 '20 at 20:27
  • That did it. Outstanding! I haven't run across "cur_data()" before. Thanks for the tip! – Tom Dilling Dec 16 '20 at 21:01
  • @TomDilling you're welcome :) – Abdessabour Mtk Dec 18 '20 at 18:24

1 Answers1

0

U can use cur_data() to access the current group data and because we're using rowwise() it'll get the row data:

df %>% rowwise()  %>% mutate(GTVD95 =  print(cur_data()))
#> # A tibble: 1 x 4
#>       a     b     c     d
#>   <int> <int> <int> <int>
#> 1     1     1     1     1
#> # A tibble: 1 x 4
#>       a     b     c     d
#>   <int> <int> <int> <int>
#> 1     2     2     2     2
#> # A tibble: 1 x 4
#>       a     b     c     d
#>   <int> <int> <int> <int>
#> 1     3     3     3     3
#> # A tibble: 1 x 4
#>       a     b     c     d
#>   <int> <int> <int> <int>
#> 1     4     4     4     4
#> # A tibble: 1 x 4
#>       a     b     c     d
#>   <int> <int> <int> <int>
#> 1     5     5     5     5

pmap will actually send each column as an argument ie

pmap(df, function(...) print(c(...)))
#> a b c d 
#> 1 1 1 1 
#> a b c d 
#> 2 2 2 2 
#> a b c d 
#> 3 3 3 3 
#> a b c d 
#> 4 4 4 4 
#> a b c d 
#> 5 5 5 5

And since your function needs to get the whole data.frame that has the row's data we can do it this way :

df %>% rowwise() %>% mutate(GTVD95 = Dx(cur_data(), "GTV", 95))

Or if you want to use pmap:

Dx <- function(..., structure, percent) {
  df <- as_tibble(list(...))
  .
  .
  .
}
# use df or cur_data() inside mutate
df %>% mutate(GTVD95 = pmap(df, Dx, structure="GTV", percent=95))
Abdessabour Mtk
  • 3,895
  • 2
  • 14
  • 21
  • 1
    I was not aware of the `cur_data()` set of functions. That's pretty neat. I've always used `pmap(~with(list(...), print(GTVD95)))` to be able to access columns by name. – Ian Campbell Dec 16 '20 at 20:44
  • @IanCampbell thanks for the feedback, glad to hear that my answer helped someone :) – Abdessabour Mtk Dec 16 '20 at 20:53