Combining dataframe columns with same prefix

Question

I have a dataframe containing the results of multiple Boruta variable selections with environmental variables from different sources as predictors. These predictors are often from different sources (e.g. sources a, b, c), so some have been coded with different suffixes, but representing the same parameter (e.g nitrogen_a, nitrogen_b, nitrogen_c, phosphate_a, phosphate_b etc.).

I need a way to use something like grepl() to identify and group variables that have the same start of the name and collapse them into single variables with the shared variable (e.g. nitrogen, phosphate).

Note, for each row only one variable within a set with a shared variable name prefix contains a non-NA value. So it should be possible to collapse multiple variables into one by simply excluding the NA values. All variables are character vectors.

How might I go about this?

[See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with, including a workable sample of data — camille, Dec 17 '21 at 22:23

score 0 · Accepted Answer · answered Dec 17 '21 at 22:04

Your results are currently in wide format. The strategy I took was to first make the data long, remove the NA rows, and make the table wide again using updated column names. Here's a tidyverse solution:

library(tidyverse)

data = tribble(
  ~ id, ~ x_1, ~ x_2, ~ y_1, ~ y_2,
  "a", 1, NA, 100, NA,
  "b", NA, 2, NA, 200
)

data %>% 
  pivot_longer(-id) %>% 
  filter(!is.na(value)) %>% 
  # Update name so that everything until the _ is kept
  mutate(name = str_extract(name, "[^_]+")) %>% 
  pivot_wider(names_from = "name", values_from = "value")

The data:

> data
# A tibble: 2 x 5
  id      x_1   x_2   y_1   y_2
  <chr> <dbl> <dbl> <dbl> <dbl>
1 a         1    NA   100    NA
2 b        NA     2    NA   200

The output:

# A tibble: 2 x 3
  id        x     y
  <chr> <dbl> <dbl>
1 a         1   100
2 b         2   200

This is perfect. Thanks so much. This worked well for solving my problem. I can't believe I didn't think to transform to long format first! — Matthew Holland, Dec 18 '21 at 08:01

Combining dataframe columns with same prefix

1 Answers1