I have a dataframe containing the results of multiple Boruta variable selections with environmental variables from different sources as predictors. These predictors are often from different sources (e.g. sources a, b, c), so some have been coded with different suffixes, but representing the same parameter (e.g nitrogen_a, nitrogen_b, nitrogen_c, phosphate_a, phosphate_b etc.).
I need a way to use something like grepl() to identify and group variables that have the same start of the name and collapse them into single variables with the shared variable (e.g. nitrogen, phosphate).
Note, for each row only one variable within a set with a shared variable name prefix contains a non-NA value. So it should be possible to collapse multiple variables into one by simply excluding the NA values. All variables are character vectors.
How might I go about this?