3

I'm relatively new to regex, so bear with me if the question is trivial. I'd like to place a comma between every letter of a string using regex, e.g.:

x <- "ABCD"

I want to get

"A,B,C,D"

It would be nice if I could do that using gsub, sub or related on a vector of strings of arbitrary number of characters.

I tried

> sub("(\\w)", "\\1,", x)
[1] "A,BCD"
> gsub("(\\w)", "\\1,", x)
[1] "A,B,C,D,"
> gsub("(\\w)(\\w{1})$", "\\1,\\2", x)
[1] "ABC,D"
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Tom
  • 934
  • 12
  • 20

4 Answers4

6

Try:

x <- 'ABCD'
gsub('\\B', ',', x, perl = T)

Prints:

[1] "A,B,C,D"

Might have misread the query; OP is looking to add comma's between letters only. Therefor try:

gsub('(\\p{L})(?=\\p{L})', '\\1,', x, perl = T)
  • (\p{L}) - Match any kind of letter from any language in a 1st group;
  • (?=\p{L}) - Positive lookahead to match as per above.

We can use the backreference to this capture group in the replacement.

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • 1
    ah, right: *‘⁠\B⁠’ matches the empty string provided it is not at an edge of a word*; another helpful symbol. – Tom Oct 31 '22 at 09:15
  • @Tom, I do have a followup question though. Do you wish to add a comma between **all** characters? Including comma's itself? For example, what is your desired output for say `Hello,12&4`. Though `\B` works for the current sample data it may cause unexpected results. – JvdV Oct 31 '22 at 09:16
  • hehe, nice. But no, not in this instance – Tom Oct 31 '22 at 09:17
  • @Tom so `\B` might not be what you are looking for as it would also match positions that are not exclusively between letters. I interpreted your question wrong at first but edited my answer to include a way to match the position between letters only. – JvdV Oct 31 '22 at 09:22
  • ok *⁠\p{xx}⁠’ and ‘⁠\P{xx}⁠’ which match characters with and without property ‘⁠xx⁠’ respectively* whereas `L` stands for "letter"?! – Tom Oct 31 '22 at 09:25
  • 1
    @Tom, yes. See [this](https://www.regular-expressions.info/unicode.html) list for referencing. – JvdV Oct 31 '22 at 09:26
  • 1
    Nice one and imho best fitting answer yet, especially the second part. – bobble bubble Oct 31 '22 at 10:24
3

You can use

> gsub("(.)(?=.)", "\\1,", x, perl=TRUE)
[1] "A,B,C,D"

The (.)(?=.) regex matches any char capturing it into Group 1 (with (.)) that must be followed with any single char ((?=.)) is a positive lookahead that requires a char immediately to the right of the current location).

Vriations of the solution:

> gsub("(.)(?!$)", "\\1,", x, perl=TRUE)
## Or with stringr:
## stringr::str_replace_all(x, "(.)(?!$)", "\\1,")
[1] "A,B,C,D"

Here, (?!$) fails the match if there is an end of string position.

See the R demo online:

x <- "ABCD"
gsub("(.)(?=.)", "\\1,", x, perl=TRUE)
# => [1] "A,B,C,D"
gsub("(.)(?!$)", "\\1,", x, perl=TRUE)
# => [1] "A,B,C,D"
stringr::str_replace_all(x, "(.)(?!$)", "\\1,")
# => [1] "A,B,C,D"
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    ah. Thanks! I thought it was something about the lookahead, but I didn't manage to make it work. Thanks also for the explanation, this is realy helpful. – Tom Oct 31 '22 at 09:09
2

A non-regex friendly answer:

paste(strsplit(x, "")[[1]], collapse = ",")
#[1] "A,B,C,D"
Maël
  • 45,206
  • 3
  • 29
  • 67
0

Another option is to use positive look behind and look ahead to assert there is a preceding and a following character:

library(stringr)
str_replace_all(x, "(?<=.)(?=.)", ",")
[1] "A,B,C,D"
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34