0

I am trying to paste 2 text columns together. The problem is that some of the values in each column are NA and if that is the case, I don't want the NA as part of my pasted string. Here is an example of what I mean

some data:

dat <- data.frame("col1" = c("stuff", "stuff", "stuff", NA, NA),
           "col2" = c("things", NA, "things", "things", NA))

dat

   col1   col2
1 stuff things
2 stuff   <NA>
3 stuff things
4  <NA> things
5  <NA>   <NA>

This is what I need:

col1   col2          col3
1 stuff things stuff; things
2 stuff   <NA>         stuff
3 stuff things stuff; things
4  <NA> things        things
5  <NA>  <NA>          <NA>

I can use paste() and then clean up the mess with gsub() but I was looking for a better one-liner.

Thanks

user1658170
  • 814
  • 2
  • 14
  • 24
  • you could try this: `apply(dat, 1, function(f) ifelse(all(is.na(f)), NA, paste0(f[which(!is.na(f))], collapse = "; ")))` – TinglTanglBob May 24 '19 at 21:32

2 Answers2

4

Here's a way in base R -

dat$col3 <- apply(dat, 1, function(x) paste0(na.omit(x), collapse = "; "))

   col1   col2          col3
1 stuff things stuff; things
2 stuff   <NA>         stuff
3 stuff things stuff; things
4  <NA> things        things
5  <NA>   <NA>              
Shree
  • 10,835
  • 1
  • 14
  • 36
3

We can use tidyverse approach. Loop through the rows with pmap, remove the NA elements with na.omit and paste the rows together

library(tidyverse)
dat %>% 
  mutate_all(as.character) %>% 
  mutate(col3 = pmap_chr(., ~ c(...) %>% 
                             na.omit %>%
                             paste(collapse="; ")))
# col1   col2          col3
#1 stuff things stuff; things
#2 stuff   <NA>         stuff
#3 stuff things stuff; things
#4  <NA> things        things
#5  <NA>   <NA>   

Or another option is

dat %>%
   mutate_all(as.character) %>%
  mutate(col3 = case_when(is.na(col1)|is.na(col2) ~ 
       coalesce(col1, col2), 
       TRUE ~ str_c(pmin(col1, col2, na.rm = TRUE), 
             pmax(col1, col2, na.rm = TRUE), sep="; ")))
#   col1   col2          col3
#1 stuff things stuff; things
#2 stuff   <NA>         stuff
#3 stuff things stuff; things
#4  <NA> things        things
#5  <NA>   <NA>          <NA>

Or using a vectorized approach with base R

i1 <- !!rowSums(!is.na(dat))
dat$col3[i1] <- gsub("^NA;|; NA", "", do.call(paste, c(dat[i1,], sep="; ")))
dat
#   col1   col2          col3
#1 stuff things stuff; things
#2 stuff   <NA>         stuff
#3 stuff things stuff; things
#4  <NA> things        things
#5  <NA>   <NA>          <NA>
akrun
  • 874,273
  • 37
  • 540
  • 662