6

I have a strings that look like this:

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")

In between each group, there's " & ". I want to use R (either sub() or something from the stringr package) to replace every " &" with a "," when there's more than one "&" present. However, I don't want the final "&" to be changed. How would I do that so it looks like:

#Note: Only the 3rd and 4th strings should be changed
solution <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1, GROUP 2 & GROUP 3", "GROUP 1, GROUP 2, GROUP 3 & GROUP 4")

In the actual string, there could be an infinite number of "&"s, so I don't want to hard code a limit if possible.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
J.Sabree
  • 2,280
  • 19
  • 48

6 Answers6

4

We could use regular expressions with a lookahead assertion Regex lookahead, lookbehind and atomic groups.

library(stringr)
str_replace_all(problem, " &(?=.*?&)", ", ")

output:

[1] "GROUP 1"                              
[2] "GROUP 1 & GROUP 2"                    
[3] "GROUP 1,  GROUP 2 & GROUP 3"          
[4] "GROUP 1,  GROUP 2,  GROUP 3 & GROUP 4"
TarJae
  • 72,363
  • 6
  • 19
  • 66
4

Using strsplit

 sapply(strsplit(problem, "\\s+&\\s+"), 
    function(x) sub(",([^,]+$)", " & \\1", toString(x)))

-output

[1] "GROUP 1"                              "GROUP 1 &  GROUP 2"                   "GROUP 1, GROUP 2 &  GROUP 3"          "GROUP 1, GROUP 2, GROUP 3 &  GROUP 4"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • ,@akrun, Why do you get an one line output solution [1]. If I apply your code I get four lines [1]-[4]? – TarJae Oct 08 '21 at 19:51
  • 1
    @TarJae it could be the width settings in your options that my be different from mine – akrun Oct 08 '21 at 21:20
3

You can use

\K&(?= .* & )

The pattern matches:

  • \K Match a space, and clear the match buffer (forget what is matched so far)
  • & Match literally
  • (?= .* & ) Positive lookahead, assert a space to the right and another occurrence of &

Regex demo

For example

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
gsub(" \\K&(?= .* & )", ",", problem, perl=T)

Output

[1] "GROUP 1"                              
[2] "GROUP 1 & GROUP 2"                    
[3] "GROUP 1 , GROUP 2 & GROUP 3"          
[4] "GROUP 1 , GROUP 2 , GROUP 3 & GROUP 4"
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
2

It could be done using Perl mode and the \G anchor.

Insure 2 or more &'s, then match any & that has another downstream.

(?m)(?:^(?=.*&.*&)|(?!^)\G)[^&\n]*\K&(?=.*&)

Replace with comma ,

https://regex101.com/r/Mtvopf/1

 (?m)
 (?:
    ^ 
    (?= .* & .* & )
  | (?! ^ )
    \G 
 )
 [^&\n]* \K &
 (?= .* & )
sln
  • 2,071
  • 1
  • 3
  • 11
1

Another solution:

str_replace_all(problem," &", ",") %>% 
  str_replace(", (GROUP [0-9])$", " & \\1")
PaulS
  • 21,159
  • 2
  • 9
  • 26
1

Use

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
library(stringr)
str_replace_all(problem, "\\s*&\\s*(?=[^&]*&)", ", ")

Results:

[1] "GROUP 1"                             "GROUP 1 & GROUP 2"                  
[3] "GROUP 1, GROUP 2 & GROUP 3"          "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"

See R proof.

EXPLANATION

--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  &                        '&'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^&]*                    any character except: '&' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
  )                        end of look-ahead
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37