How do I create a loop and/or a function to divide 200 columns (and create 200 new columns/variables) by another column to get a percentage?

Question

How do I do this in a loop so I can do 200 columns? and how do I name the name the columns so that it is the old column name with a "p_" in front of it? Is this possible?

For example I'm trying to do something like this but with 200 columns.

fans <- data.frame(
  population = c(1234, 5678, 2345, 6789, 3456, 7890,
                4567, 8901, 5678, 9012, 6789),
  bearsfans = c(123, 234, 345, 456, 567,678, 789, 890, 901, 135, 246),
  packersfans = c(11,22,33,44,55,66,77,88,99,100,122),
  vikingsfans = c(39, 49, 59, 61, 32, 22, 31, 92, 52, 10, 122))

print(fans)

attach(fans)

## create new columns which are the ratio of fans to population

fans$p_bearsfan = bearsfans/population
print(fans)


 Output:
 ##     population bearsfans packersfans vikingsfans p_bearsfan
 ##  1        1234       123          11          39 0.09967585
 ##  2        5678       234          22          49 0.04121169

score 1 · Accepted Answer · answered Sep 19 '19 at 22:07

temp <- sapply(fans[-1], function(x) x / fans$population)
colnames(temp) <- paste0("p_", colnames(temp))
cbind(fans, temp)



population bearsfans packersfans vikingsfans p_bearsfans p_packersfans p_vikingsfans
1        1234       123          11          39  0.09967585   0.008914100   0.031604538
2        5678       234          22          49  0.04121169   0.003874604   0.008629799
3        2345       345          33          59  0.14712154   0.014072495   0.025159915
4        6789       456          44          61  0.06716748   0.006481072   0.008985123
5        3456       567          55          32  0.16406250   0.015914352   0.009259259
6        7890       678          66          22  0.08593156   0.008365019   0.002788340
7        4567       789          77          31  0.17276111   0.016860083   0.006787826
8        8901       890          88          92  0.09998877   0.009886530   0.010335917
9        5678       901          99          52  0.15868263   0.017435717   0.009158154
10       9012       135         100          10  0.01498003   0.011096316   0.001109632
11       6789       246         122         122  0.03623509   0.017970246   0.017970246

EXCELLENT, it worked! Thank you! I new there had to be a simple way to do it, but I'm new to functions. — shp5009, Sep 20 '19 at 14:36
Your welcome. Please note you can only mark one answer as accepted, but feel free to pick your favourite. :) — Lennyy, Sep 20 '19 at 14:38
@shp5009 While you can only select one answer as the accepted answer, you can *upvote* as many as you like;-) — Maurits Evers, Sep 21 '19 at 09:16
True that, but then you need 15 reputation. 2 to go! I upvoted the question. And your answer ;) — Lennyy, Sep 21 '19 at 09:23

score 1 · Answer 2 · answered Sep 19 '19 at 22:08

If you're happy with a suffixed new column name (instead of prefixing), this is a one-liner using dplyr::mutate_at. I assume here that all relevant columns end with the word "fans".

With suffixes

fans %>% mutate_at(vars(ends_with("fans")), list(percent = ~.x / population))
#   population bearsfans packersfans vikingsfans bearsfans_percent
#1        1234       123          11          39        0.09967585
#2        5678       234          22          49        0.04121169
#3        2345       345          33          59        0.14712154
#4        6789       456          44          61        0.06716748
#5        3456       567          55          32        0.16406250
#6        7890       678          66          22        0.08593156
#7        4567       789          77          31        0.17276111
#8        8901       890          88          92        0.09998877
#9        5678       901          99          52        0.15868263
#10       9012       135         100          10        0.01498003
#11       6789       246         122         122        0.03623509
#   packersfans_percent vikingsfans_percent
#1          0.008914100         0.031604538
#2          0.003874604         0.008629799
#3          0.014072495         0.025159915
#4          0.006481072         0.008985123
#5          0.015914352         0.009259259
#6          0.008365019         0.002788340
#7          0.016860083         0.006787826
#8          0.009886530         0.010335917
#9          0.017435717         0.009158154
#10         0.011096316         0.001109632
#11         0.017970246         0.017970246

With prefixes

To turn the suffixes into prefixes requires on more step

fans %>%
    mutate_at(vars(ends_with("fans")), list(percent = ~.x / population)) %>%
    rename_at(vars(ends_with("percent")), ~sub("(.+)_percent", "p_\\1", .x))
#   population bearsfans packersfans vikingsfans p_bearsfans p_packersfans
#1        1234       123          11          39  0.09967585   0.008914100
#2        5678       234          22          49  0.04121169   0.003874604
#3        2345       345          33          59  0.14712154   0.014072495
#4        6789       456          44          61  0.06716748   0.006481072
#5        3456       567          55          32  0.16406250   0.015914352
#6        7890       678          66          22  0.08593156   0.008365019
#7        4567       789          77          31  0.17276111   0.016860083
#8        8901       890          88          92  0.09998877   0.009886530
#9        5678       901          99          52  0.15868263   0.017435717
#10       9012       135         100          10  0.01498003   0.011096316
#11       6789       246         122         122  0.03623509   0.017970246
#   p_vikingsfans
#1    0.031604538
#2    0.008629799
#3    0.025159915
#4    0.008985123
#5    0.009259259
#6    0.002788340
#7    0.006787826
#8    0.010335917
#9    0.009158154
#10   0.001109632
#11   0.017970246

Thanks. This is very helpful. I maybe shouldn't have added fans to the variable names and was just thinking columns. My variables do have groups of names with the same suffixes but not all of them and the suffixes are different by group. Some of the columns are named completely different. Are there ways to do it just by column number and still add the 'p_' to the beginning or end of the variable name? — shp5009, Sep 20 '19 at 14:07
@shp5009 Sure; you can specify the column numbers directly. For example, `fans %>% mutate_at(2:3, list(percent = ~.x / population))` creates new percentage columns only based on columns `2:3`. — Maurits Evers, Sep 20 '19 at 14:18

score 1 · Answer 3 · answered Sep 20 '19 at 00:40

We can directly divide multiple columns with one column. We use grep to select columns which end with "fans" and use those names to assign new columns.

cols <- grep("fans$", names(fans), value = TRUE)
fans[paste0("p_", cols)] <- fans[cols]/fans$population

fans
#   population bearsfans packersfans vikingsfans p_bearsfans p_packersfans p_vikingsfans
#1        1234       123          11          39     0.09968      0.008914      0.031605
#2        5678       234          22          49     0.04121      0.003875      0.008630
#3        2345       345          33          59     0.14712      0.014072      0.025160
#4        6789       456          44          61     0.06717      0.006481      0.008985
#5        3456       567          55          32     0.16406      0.015914      0.009259
#6        7890       678          66          22     0.08593      0.008365      0.002788
#7        4567       789          77          31     0.17276      0.016860      0.006788
#8        8901       890          88          92     0.09999      0.009887      0.010336
#9        5678       901          99          52     0.15868      0.017436      0.009158
#10       9012       135         100          10     0.01498      0.011096      0.001110
#11       6789       246         122         122     0.03624      0.017970      0.017970

Also as a side note : Why is it not advisable to use attach() in R, and what should I use instead?

score 0 · Answer 4 · edited Jun 20 '20 at 09:12

0

Data

  fans <- data.frame(
  population = c(1234, 5678, 2345, 6789, 3456, 7890,
                 4567, 8901, 5678, 9012, 6789),
  bearsfans = c(123, 234, 345, 456, 567,678, 789, 890, 901, 135, 246),
  packersfans = c(11,22,33,44,55,66,77,88,99,100,122),
  vikingsfans = c(39, 49, 59, 61, 32, 22, 31, 92, 52, 10, 122))

Naive Solution

for (c in names(fans)[-1]){
  fans[[paste0("p_",c)]] <- fans[[c]] / fans[["population"]]
}

edited Jun 20 '20 at 09:12

Community

1
1

answered Sep 19 '19 at 21:44

slava-kohut

4,203
1
7
24

1

Having two nested loops (`apply` within `for`) is not necessary and creates unnecessary overhead. In fact there shouldn't be a need for `apply` here as simple division in R `/` is already vectorised. – Maurits Evers Sep 19 '19 at 22:14
1

[continued] So the `for` loop would become `for (c in names(fans)[2:ncol(fans)]) fans[[paste0("p_",c)]] <- fans[[c]] / fans[["population"]]`. – Maurits Evers Sep 19 '19 at 23:10
@MauritsEvers yes I agree, that's very inefficient - I realize that. Edited my post. – slava-kohut Sep 19 '19 at 23:13
@MauritsEvers out of curiosity, I checked how slow is the solution with `apply`. It is still almost 10x faster than `mutate` (but 3x slower than naive loop over columns - the fastest of all posted in this thread). – slava-kohut Sep 19 '19 at 23:49
You should include the results from those benchmark tests in your answer. I'm sure this will be interesting to OP and future readers. I certainly would be interested. It doesn't surprise me that `mutate_at` is not the fastest method. `tidyverse` methods are usually more about code readability (and even that would be contested by some) than about speed. – Maurits Evers Sep 19 '19 at 23:55
@MauritsEvers Yes, will do when time permits. I also included one more solution using `Reduce` in benchmarks, but it is still slower than the naive loop. – slava-kohut Sep 20 '19 at 00:01
Great! You may already be aware of the R package `microbenchmark` but if not, it's a great library to quickly perform benchmark comparisons and show their results. Definitely worth an upvote from me if you were to include all this in your post:-) – Maurits Evers Sep 20 '19 at 00:16
Thanks, everybody. – shp5009 Sep 20 '19 at 14:37

How do I create a loop and/or a function to divide 200 columns (and create 200 new columns/variables) by another column to get a percentage?

4 Answers4

With suffixes

With prefixes

Data

Naive Solution