3

I have a dataframe (tibble) with multiple columns, and for each column after the first two I'd like to keep the absolute value, but also insert the value relative to the second column. For example, I start with the following data frame (the column names might vary!):

df = tibble(val1 = 5:10, val2 = 10:15, val3 = 15:20); df
# A tibble: 6 x 3
   val1  val2  val3
  <int> <int> <int>
1     5    10    15
2     6    11    16
3     7    12    17
4     8    13    18
5     9    14    19
6    10    15    20

Now, for the val2 and val3 columns, I also want to insert a column right after val2 and after val3 showing the value relative to val1. How can I do that???

The resulting tibble should look like:

dfrel = tibble(val1 = 5:10, val2 = 10:15, rel2 = val2/val1, val3 = 15:20, rel3 = val3/val1)
dfrel
# A tibble: 6 x 5
   val1  val2  rel2  val3  rel3
  <int> <int> <dbl> <int> <dbl>
1     5    10  2.00    15  3.00
2     6    11  1.83    16  2.67
3     7    12  1.71    17  2.43
4     8    13  1.62    18  2.25
5     9    14  1.56    19  2.11
6    10    15  1.50    20  2.00

Unfortunately, I'm unable to write the proper mutate_at call to insert that relative column right after each value column. In fact, I'm unable to write a mutate_at using funs() that modifies a column by accessing other columns (by position rather than name).

Replacing val2 and val3 by the relative values works (using a lambda function instead of funs), but does not preserve the original val2 and val3 columns as required:

df %>%
     mutate_at(vars(-1), function(v) v/.[[1]])
# A tibble: 6 x 3
   val1  val2  val3
  <int> <dbl> <dbl>
1     5  2.00  3.00
2     6  1.83  2.67
3     7  1.71  2.43
4     8  1.62  2.25
5     9  1.56  2.11
6    10  1.50  2.00

All my attempts to use funs() fail:

df %>%
     mutate_at(vars(-1), funs(./.tbl[[1]]))
Error in mutate_impl(.data, dots) : 
  Evaluation error: object '.tbl' not found.

df %>%
     mutate_at(vars(-1), funs(function(v) v/.[[1]]))
Error in mutate_impl(.data, dots) : 
  Column `val2` is of unsupported type function

One complication compared to Using functions of multiple columns in a dplyr mutate_at call is that my val1 column does not have a fixed name (i.e. it is not always called val1), so I cannot use it by name in the funs arguments. The other complication is that the tibble is created on the fly (using lots of pipe operators) and is typically not stored in a variable, so I cannot simply divide by df[[1]]...

So, what is the proper dplyr approach to insert relative columns (i.e. percent of the first column) after each column?

2 Answers2

1

Give your function a name by wrapping it in a list, so mutate_at will create new columns. Something like following (the column names may not be as ideal so you might need to rename them if needed):

df %>% mutate_at(vars(-1), list(rel = function(v) v / .[[1]]))

# A tibble: 6 x 5
#   val1  val2  val3 val2_rel val3_rel
#  <int> <int> <int>    <dbl>    <dbl>
#1     5    10    15     2.00     3.00
#2     6    11    16     1.83     2.67
#3     7    12    17     1.71     2.43
#4     8    13    18     1.62     2.25
#5     9    14    19     1.56     2.11
#6    10    15    20     1.50     2.00
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thanks, giving the name postfix in a list is something I would have never thought of. THe naming is not an issue, as I'll post-process the col.names in kable anyway. Unfortunately, your solution appends all relative columns to the end. Is there any way to insert the new columns right after each value column? The number of columns is not fixed, as they are values from different mortality tables that I need to complare for a publication, so rearranging manually is not straightforward. – Reinhold Kainhofer Mar 12 '18 at 15:43
  • How about sorting the column names? Chain `%>% select(sort(names(.)))` after `mutate_at`? This may not retain the original columns order, but it does attach the *rel* column after the corresponding *val* column. – Psidom Mar 12 '18 at 15:48
  • Or if you want to maintain all the columns order, you can do something like this: `select(1, { mut_cols = tail(names(.), -1); cbind(mut_cols[c(T,F)], mut_cols[c(F,T)]) })` – Psidom Mar 12 '18 at 15:59
  • Thanks, I now figured out a way myself to re-order the columns using a helper function to generate the indices: `interleaveColumns = function(v) { c(1, unlist(split(2:length(v), 1:((length(v)-1)/2)), use.names = FALSE)) }` Then a simple select will do `%>% select(interleaveColumns(.))` – Reinhold Kainhofer Mar 12 '18 at 17:27
0

With Psidom's help, this is my final solution to the problem:

interleaveColumns = function(v) { 
    c(1, unlist(split(2:length(v), 1:((length(v)-1)/2)), use.names = FALSE)) 
}

df = tibble(val1 = 5:10, val2 = 10:15, val3 = 15:20, val4 = 25:30, val5 = 1:6);

# mutate_at can be given a named list to create a new column 
# for each existing columnt (appended to the end => we need 
# to reorder the columns and interleave the new columns with 
# the old columns using the interleaveColumns function)

df %>%
     mutate_at(vars(-1), list(rel = function(v) v/.[[1]])) %>% 
     select(interleaveColumns(.))

# A tibble: 6 x 9
   val1  val2 val2_rel  val3 val3_rel  val4 val4_rel  val5 val5_rel
  <int> <int>    <dbl> <int>    <dbl> <int>    <dbl> <int>    <dbl>
1     5    10     2.00    15     3.00    25     5.00     1    0.200
2     6    11     1.83    16     2.67    26     4.33     2    0.333
3     7    12     1.71    17     2.43    27     3.86     3    0.429
4     8    13     1.62    18     2.25    28     3.50     4    0.500
5     9    14     1.56    19     2.11    29     3.22     5    0.556
6    10    15     1.50    20     2.00    30     3.00     6    0.600