pmin of columns with names matching a pattern

Question

I've got a table such as this:

structure(list(Suggested.Symbol = c("CCT4", "DHRS2", "PMS2", 
"FARSB", "RPL31", "ASNS"), gwas_p.onset = c(0.9378, 0.5983, 7.674e-10, 
0.09781, 0.5495, 0.7841), gwas_p.dc14 = c(0.3975, 0.3707, 6.117e-17, 
0.2975, 0.4443, 0.7661), gwas_p.tfc6 = c(0.2078, 0.896, 7.388e-19, 
0.5896, 0.3043, 0.6696), gwas_p.tms30 = c(0.5724, 0.3409, 4.594e-13, 
0.2403, 0.1357, 0.3422)), row.names = c(NA, 6L), class = "data.frame")

I can find the minimum value in certain columns by name like so:

df <- df %>%
mutate(p.min = pmin(p_onset, p_dc14))

However, how would I find the pmin of all columns with names matching a certain pattern, e.g. column names starting "gwas_p"??

score 3 · Accepted Answer · answered Jul 21 '22 at 21:29

You could use do.call with pmin after selecting necessary columns by the given name pattern (using startsWith)

> transform(df, p.min = do.call(pmin, df[startsWith(names(df), "gwas_p")]))
  Suggested.Symbol gwas_p.onset gwas_p.dc14 gwas_p.tfc6 gwas_p.tms30     p.min
1             CCT4    9.378e-01   3.975e-01   2.078e-01    5.724e-01 2.078e-01
2            DHRS2    5.983e-01   3.707e-01   8.960e-01    3.409e-01 3.409e-01
3             PMS2    7.674e-10   6.117e-17   7.388e-19    4.594e-13 7.388e-19
4            FARSB    9.781e-02   2.975e-01   5.896e-01    2.403e-01 9.781e-02
5            RPL31    5.495e-01   4.443e-01   3.043e-01    1.357e-01 1.357e-01
6             ASNS    7.841e-01   7.661e-01   6.696e-01    3.422e-01 3.422e-01

score 2 · Answer 2 · answered Jul 21 '22 at 21:18

2

This can be done in ordinary dplyr with c_across. See this link for more solutions: Get the min of two columns

df %>%
  rowwise() %>%
  mutate(minimum = min(c_across(starts_with("gwas_p"))))

answered Jul 21 '22 at 21:18

dcsuka

2,922
3
6
27

Worth noting that this is quite inefficient, as I think you're calling `min()` for *every* single row one by one. It won't matter if you've got 1000 rows in your data, but if you've got a million or more, you'll be waiting minutes compared to `pmin`, which will finish in an instant. – thelatemail Jul 21 '22 at 21:40
Fair enough. But the elegance of `dplyr` compensates for the lost speed at least partially. – dcsuka Jul 21 '22 at 21:52
3

If you want to use the tidyverse you can do `df %>% mutate(minimum = invoke(pmin, select(.,starts_with("gwas_p"))))` and have the best of both worlds. – thelatemail Jul 21 '22 at 22:11
One way to use `dplyr` with `pmin` can be: `df %>% mutate(p.min = select(., starts_with("gwas_p")) %>% do.call(pmin, .))`. This is largely inspired in @ThomasIsCoding's solution though. – PaulS Jul 21 '22 at 22:13

pmin of columns with names matching a pattern

2 Answers2