I happened to be looking at this question, and I chanced upon this string:
#2335, IFCRELASSOCIATESMATERIAL, '2ON6$yXXD1GAAH8whbdZmc', #5,$,$, [#40,#221,#268,#281],#2334
And I got interested in trying to replace only the commas (,
) within the substring [#40,#221,#268,#281]
with underscores (_
). I was attempting this in R
with the stringr
package, and my idea was to use str_replace()
as follows:
- First locate the substring in the parent string with lookarounds:
(?<=\\[).+(?=\\[)
. (I am using\\
to escape since that's whatstringr
uses.) - Then match all instances of only the commas within the substring with
[^0-9#]+
. So now the regex would be(?<=\\[)[^0-9#]+(?=\\[)
. - Now use
str_replace()
to replace the above matches with_
as follows:str_replace(mystring, "(?<=\\[)[^0-9#]+(?=\\[)", "_")
- where
mystring
contains the string#2335, IFCRELASSOCIATESMATERIAL, '2ON6$yXXD1GAAH8whbdZmc', #5,$,$, [#40,#221,#268,#281],#2334
I thought the regex I constructed should parse as: replace one or more characters that are not digits
or #
within the bounds of [
and ]
with the character _
. But evidently, this isn't the case as my attempt did not work.
Where am I going wrong and what is/are the right way(s) to solve regex problems of this kind?
tl;dr: how does one extract all tokens but a certain token (or set of tokens) from a substring bounded by two other arbitrary tokens?