1

Assume a string vector like this:

x <- c("abc", "abcde", "abcde123")

I want to add a separator (a comma or whatever) between every element of a given string to achieve something like this (here the separator is a comma):

[1] "a,b,c"           "a,b,c,d,e"       "a,b,c,d,e,1,2,3"

I am able to achieve it with:

sapply(strsplit(x, "", fixed = TRUE), function(x) paste(x, collapse = ","))

however, I am curious whether there is a different way to achieve it.

tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • A shorter version without the generic function: `sapply(strsplit(x, ""), paste, collapse=",")`. – lmo Jun 15 '19 at 19:16
  • I have already answered the question as stated but was wondering why you need this. – G. Grothendieck Jun 15 '19 at 19:52
  • @G. Grothendieck I was looking for a new way to approach situations involving `tidyr` `separate()` with `sep = ""`. – tmfmnk Jun 15 '19 at 19:59
  • 2
    In that case you can do this: `data.frame(X = "abc") %>% separate(X, into = c("A", "B", "C"), sep = "\\B")` – G. Grothendieck Jun 15 '19 at 20:03
  • @G. Grothendieck really nice! Your help is greatly appreciated :) – tmfmnk Jun 15 '19 at 20:05
  • 1
    or try this: `data.frame(X = "abc") %>% separate(X, into = c("A", "B", "C"), sep = 1:3)` – G. Grothendieck Jun 15 '19 at 20:32
  • 2
    @G.Grothendieck Perhaps add the two `separate` alternatives (and other?) to [Separate a column into multiple columns using tidyr::separate with sep=“”](https://stackoverflow.com/questions/28956264/separate-a-column-into-multiple-columns-using-tidyrseparate-with-sep) Cheers. – Henrik Jun 15 '19 at 20:38
  • OK. I have added it there. The \\B does not work in that case because the data does not consist of only word characters but the integer sep works. – G. Grothendieck Jun 15 '19 at 23:13

2 Answers2

7

1) Using zero width matches The two (...) match a character before and after where we want a comma respectively but are zero width in that they do not consume any characters.

gsub("(?<=.)(?=.)", ",", x, perl = TRUE)
## [1] "a,b,c"           "a,b,c,d,e"       "a,b,c,d,e,1,2,3"

1a) This also works. Here we match a character and a non-consuming following character and replace that with the character matched and a comma.

gsub("(.)(?=.)", "\\1,", x, perl = TRUE)
## [1] "a,b,c"           "a,b,c,d,e"       "a,b,c,d,e,1,2,3"

2) Inserting and trimming Another approach is to replace the boundaries with comma and then trim off the commas at beginning and end. This one does not require perl regular expressions. Be sure NOT to use perl=TRUE with this. It treats \b differently.

gsub("^,|,$", "", gsub("\\b", ",", x))
## [1] "a,b,c"           "a,b,c,d,e"       "a,b,c,d,e,1,2,3"

\\K also works in place of \\b using perl = TRUE.

2a) In R 3.6 (but not earlier) trimws has an argument that allows trimming of arbitrary characters so this can be simplified to:

trimws(gsub("\\b", ",", x), whitespace = ",")
## [1] "a,b,c"           "a,b,c,d,e"       "a,b,c,d,e,1,2,3"

2b) This variation works even pre-3.6 but assumes that there are no tabs in the strings. It replaces each boundary with a tab, trims whitespace off the ends and then replaces tabs with commas.

chartr("\t", ",", trimws(gsub("\\b", "\t", x)))
## [1] "a,b,c"           "a,b,c,d,e"       "a,b,c,d,e,1,2,3"

2c) It seems from the discussion under the question that comma was only an example and whitespace would be just as good as far as the poster is concerned. In that case we could simplify this to:

trimws(gsub("\\b", " ", x))
## [1] "a b c"           "a b c d e"       "a b c d e 1 2 3"

3) \B Replace non-boundaries with comma like this. Be sure to specify perl regular expressions. This will work if the strings contain alphanumerics but if they contain non-word characters then not.

gsub("\\B", ",", x, perl = TRUE)
## [1] "a,b,c"           "a,b,c,d,e"       "a,b,c,d,e,1,2,3"
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • The possibility involving `trimws()` is really nice as it also shows the new argument `whitespace` in it. Thank you :) – tmfmnk Jun 15 '19 at 19:52
0

What's wrong with it? You could exploit the fact that paste is vectorized and skip the funcioning.

sapply(strsplit(x, ""), paste, collapse=",")
# [1] "a,b,c"           "a,b,c,d,e"       "a,b,c,d,e,1,2,3"

Alternatively you could use gregexpr (inspired by @Rich Scriven).

sapply(regmatches(x, gregexpr(".", x)), paste, collapse=",")
# [1] "a,b,c"           "a,b,c,d,e"       "a,b,c,d,e,1,2,3"
jay.sf
  • 60,139
  • 8
  • 53
  • 110