0

I am trying to run calculations on multiple variables with similar names (mx1_var1...mx2_var1 etc).

A simplified version of the data is below.

structure(list(mx1_amenable = c(70.0382790687902, 20.8895416774022, 
98.1328630153307, 8.63038330575823, 21.098387740395, 31.959849814698, 
9.22952906324882, 74.4660849895597, 29.6851613973842, 60.941434908354
), mx1_Other = c(50.0261607893197, 46.0117649431311, 51.8219837573084, 
73.7814971552898, 93.8008571298187, 92.6841115228084, 95.660659297798, 
10.8184536035572, 43.6606611340557, 81.4415005182801), mx1_preventable = c(38.6864667127179, 
22.5707957186912, 13.324746863086, 74.9369833030818, 13.0413382062397, 
98.3757571024402, 86.6179643621766, 19.7927752780922, 2.28293032845359, 
67.0137368426169), mx2_amenable = c(63.6636904898683, 40.361275660631, 
3.2234218985236, 80.4870440564426, 49.483719663574, 71.0484920255819, 
97.3726798797323, 30.0044347466731, 25.8476044496246, 39.4468283905231
), mx2_Other = c(4.0822540063483, 52.9579932985574, 38.3393867228102, 
80.8093349013419, 89.5704617034906, 7.15269982141938, 44.9889904260212, 
94.1639871656393, 17.4307996383923, 91.9360333328057), mx2_preventable = c(97.9327560952081, 
42.7026845980086, 74.6785922702186, 27.4754587243202, 14.5174992869947, 
29.298035056885, 3.2058044369044, 44.6985715883816, 33.7262168187378, 
50.9358501169921)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-10L))

I want to run calculations e.g.

mutate(diff_amenable = mx1_amenable)

Across all variables in the dataset as well as further calculations based on the output of these new figures. I think using some sort of string match and function should be able to do it but all I could come across was [this.][1]

At the moment I am working with the data in wide format and manually inputting the column names to run the calculations which is not feasible as I work with more variables (up to 70 paired values).

Any ideas how this could be done? [1]: Function to perform similar calculations on variables with similar names

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • It's not quite clear what calculations you're aiming to do. Do you want to sum across all `mx1` columns for example, then sum across all `mx2`? Or sum those and additionally sum the `_amenable` ones as another column and the `_preventable` ones as a fourth? – Andy Baxter Apr 08 '21 at 16:03
  • Sorry have realised didn't phrase calculations I need properly. I need to first sum all mx1 and mx2 variables (as below - eg mx1_allcause =sum(allmx1vars) then calculate new variables eg diff_amenable=mx1__amenable-mx2_amenable. Then a further set of variables prop_amenable=diff_amenable/diff_allcause. – Jonny Currie Apr 09 '21 at 05:16
  • An attempt at trying it out below. But the solution will depend a lot on what your other variables look like and how R is to detect pairings/groupings. – Andy Baxter Apr 09 '21 at 09:53

1 Answers1

0

This might be a slight step forward - writing functions that give the calculation for a pair of selected columns by name detection in the across function. This works for the six example columns in your dataset:

library(tidyverse)

difference <- function(...) {
  x <- list(...)
  x[[1]][[1]] - x[[1]][[2]]
}

proportion <- function(...) {
  x <- list(...)
  x[[1]][[1]] / x[[1]][[2]]
}

df %>% 
  rowwise() %>%
  transmute(
    mx1_allcause = sum(across(starts_with("mx1"))),
    mx2_allcause = sum(across(starts_with("mx2"))),
    diff_amenable = difference(across(ends_with("_amenable"))),
    diff_allcause = difference(across(ends_with("_allcause"))),
    prop_amenable = proportion(across(starts_with("diff")))
  )
#> # A tibble: 10 x 5
#> # Rowwise: 
#>    mx1_allcause mx2_allcause diff_amenable diff_allcause prop_amenable
#>           <dbl>        <dbl>         <dbl>         <dbl>         <dbl>
#>  1        159.         166.           6.37         -6.93        -0.920
#>  2         89.5        136.         -19.5         -46.5          0.418
#>  3        163.         116.          94.9          47.0          2.02 
#>  4        157.         189.         -71.9         -31.4          2.29 
#>  5        128.         154.         -28.4         -25.6          1.11 
#>  6        223.         107.         -39.1         116.          -0.338
#>  7        192.         146.         -88.1          45.9         -1.92 
#>  8        105.         169.          44.5         -63.8         -0.697
#>  9         75.6         77.0          3.84         -1.38        -2.79 
#> 10        209.         182.          21.5          27.1          0.794

Created on 2021-04-09 by the reprex package (v2.0.0)

Expanding this to your 70+ variables though might be different. My solution here relies on each calculation combining two columns being able to select the two (in order) based on a text match. If there's a need for a more complicated matching of one name to another, you might need a smarter approach or to give in and manually define pairings.

Andy Baxter
  • 5,833
  • 1
  • 8
  • 22