1

I am trying to create a new column in a tibble which is the concatenation of several string columns. These columns have names that all fit a pattern... in particular, they all start with the same substring. I am trying every combination of selecting inside and outside mutate, with each of paste, str_c, and unite I can think of, to no avail.

Reprex:

library(tibble); library(dplyr)
df <- tibble(
    include1 = c("a", "b", "c"),
    include2 = c("d", "e", NA),
    include3 = c("f", "g", "h"),
    include4 = c("i", NA, NA),
    ignore = c("j", "k", "l")
    )

df
# A tibble: 3 x 5
  include1 include2 include3 include4 ignore
  <chr>    <chr>    <chr>    <chr>    <chr> 
1 a        d        f        i        j     
2 b        e        g        NA       k     
3 c        NA       h        NA       l     

I'm trying code that looks like variants of:

df %>% 
    mutate(included = str_c(starts_with("include"), " | ", na.rm = TRUE)) %>% 
    select(ignore, included)

with the expected output:

# A tibble: 3 x 2
  ignore included     
  <chr>  <chr>        
1 j      a | d | f | i
2 k      b | e | g    
3 l      c | h    

How may I achieve this?

markus
  • 25,843
  • 5
  • 39
  • 58
Rob Creel
  • 323
  • 1
  • 8
  • Relevant: [suppress NAs in paste()](https://stackoverflow.com/questions/13673894/suppress-nas-in-paste) – markus Nov 06 '20 at 23:02
  • This post has lot of similar suggestions for your question - https://stackoverflow.com/questions/52712390/how-do-i-remove-nas-with-the-tidyrunite-function/ – Ronak Shah Nov 07 '20 at 01:55

2 Answers2

1

You can do:

library(dplyr)
library(purrr)

df %>%
  transmute(ignore, 
            included = pmap_chr(df %>% select(-ignore), ~ paste(na.omit(c(...)), collapse = " | ")))

# A tibble: 3 x 2
  ignore included     
  <chr>  <chr>        
1 j      a | d | f | i
2 k      b | e | g    
3 l      c | h        
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • 1
    Can you break down and explain your answer? `purrr` syntax is something special. – mhovd Nov 06 '20 at 22:53
  • This works well on my minimal example, which I think I made too minimal. In my real data, there are many more columns to ignore, so `select(-ignore)` doesn't work as well. – Rob Creel Nov 07 '20 at 17:58
0

We can use unite with na.rm

library(dplyr)
library(tidyr)
df %>%
    unite(included, starts_with('include'), na.rm = TRUE, sep = "| ") %>%
   select(ignore, included)

-output

# A tibble: 3 x 2
#  ignore included  
#  <chr>  <chr>     
#1 j      a| d| f| i
#2 k      b| e| g   
#3 l      c| h      
akrun
  • 874,273
  • 37
  • 540
  • 662