2

I have a dataframe structured like the following, with the number of x_L and x_R pair being possibly up to 100.

ID  Side  A_L   A_R   B_L   B_R
1    0     7     5     6     3
2    1     3     2     3     1
3    0     6     3     4     5

I know want to create a new column A_ratio for each pair A_Land A_R. The value of that new column for each ID should either be A_L / A_R if Side==0, or A_R / A_L if Side==1.

My attempt was implementing a for loop with an increment of 2, to cover each pair only once, and using ifelse to differentiate between the two possible values of Side

for (i in seq(from = 3, to = length(df), by = 2)) {
  df[sub("_L", "_ratio",(names(df[i]))] <-ifelse(df[2]==0, df[i] / df[i+1], df[i+1] / df[i])
}

This however leads to the following warning:

In `[<-.data.frame`(`*tmp*`, sub("_L", "_ratio", names(df[i])),  : provided 3 variables to replace 1 variables

The resulting values seem to be A_L / A_R for all cases, even for the IDs with Side==1. Could someone point out my mistake? I know that there should be an easier dplyr approach as well, but I'm unfortunately not familiar with that yet. Thanks in advance!

Adrian Mak
  • 137
  • 8

2 Answers2

3

A base R option

cols <- c("A", "B")
cbind(
    df,
    `colnames<-`(
        sapply(
            cols,
            function(x) {
                with(
                    df,
                    do.call("/", df[startsWith(names(df), x)])^(1 - 2 * Side)
                )
            }
        ), paste0(cols, "_ratio")
    )
)

gives

  ID Side A_L A_R B_L B_R   A_ratio   B_ratio
1  1    0   7   5   6   3 1.4000000 2.0000000
2  2    1   3   2   3   1 0.6666667 0.3333333
3  3    0   6   3   4   5 2.0000000 0.8000000
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Thank you for that answer! My "real" dataframe now contains a lot more columns `A B C D ...`. Is there an easy way to upscale this approach to all of these columns, by either names or indices? – Adrian Mak Mar 24 '21 at 13:36
3

Here's a dplyr approach.

It relies on only mutating columns that end with "L". Then using cur_column() to get the name of the column being worked on, we can create the name of the matching column with str_replace and get() its value.

We can combine this approach with case_when to consider the effect of Side.

Finally, we can use rename_with to fix up the column names since they will be based on the original L column name.

library(dplyr) 
library(stringr)
df %>%
  mutate(across(ends_with("L"), ~case_when(Side == 1 ~ get(str_replace(cur_column(),"L$","R"))/.,
                                           Side == 0 ~ ./get(str_replace(cur_column(),"L$","R"))),
                .names = "{.col}_Ratio")) %>%
  rename_with(~str_replace(.,"_L",""),contains("_L_"))
  ID Side A_L A_R B_L B_R   A_Ratio   B_Ratio
1  1    0   7   5   6   3 1.4000000 2.0000000
2  2    1   3   2   3   1 0.6666667 0.3333333
3  3    0   6   3   4   5 2.0000000 0.8000000
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • Thanks a lot for the answer and also the explanation, this will help a lot to understand the workings of `dplyr`! I needed to get the library `stringr` before running, but then it worked out perfectly. – Adrian Mak Mar 24 '21 at 14:00
  • I have a follow up question if that's okay: I now want to use `dplyr::mutate` to binarize all of my ratios, turning all values of `A_ratio < 1` to `0` and all `A_ratio >= 1` to `1`. Do spot the mistake in the following `mutate(across(ends_with("ratio"), if_else (cur_column() < 1, 0, 1), .names = "{.col}_inf)`? Thanks! – Adrian Mak Mar 24 '21 at 19:10
  • 1
    Your problem is that `cur_column()` returns a character string of the current column. If you want the values of that column, use `.`. Also, you need to preface anonymous function you have created with `~` which is interpreted by tidy evaluation as `function(x)` and assigns `x` to `.`, `.x`, and `..1`. See https://stackoverflow.com/a/62488532/ for more. – Ian Campbell Mar 24 '21 at 20:23