Ruby replace word

Question

exapl I have specific situation. I am trying to replace some words in string. I have two example strings:

string1 = "aaabbb aaa bbb" 
string2 = "a. bbb"

In string1 I want to replace full word "aaa" with "ccc" so I do it right this:

translation = "aaa"
string1.gsub(/\b#{translation}\b/, "ccc") => "aaabbb ccc bbb"

So it work and I am happy but when I try to replace "a." with "aaa" It not work and It returns string2.

I tried also this:

translation = "a."
string2.gsub(translation, "aaa") => "aaa bbb"

But when I use above gsub for string1 I get "cccbbb ccc bbb". Sorry for ma English but I hope that I explained it a little understandable. Thanks for all answers.

score 2 · Answer 1 · edited May 23 '17 at 12:13

2

Try

string1.gsub(/\b#{Regexp.escape(translation)}\b/, "ccc")

In regex '.' means "any character". by calling escape you are turning 'a.' to 'a\.' which means "a and then the period character".

Update
As @Daniel has noted in the comments, word boundaries have some subtleties. So for the above to work with "a." you need to replace the \b with look-aheads and look-behinds:

    string1.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
    # => "ccc bbb"

edited May 23 '17 at 12:13

Community

1
1

answered May 09 '14 at 09:15

Uri Agassi

36,848
14
76
93

Sure, but `"a. bbb".gsub("\b#{Regexp.escape("a.")}\b", "ccc")` does not replace `"a."` by `"ccc"` like it should. I think the `\b` does not match after `.` since `.` is not a word character. So this does not really answer the question of replacing `"a."`. – Daniël Knippers May 09 '14 at 09:16
@DaniëlKnippers updated my answer to work with such curious cases – Uri Agassi May 09 '14 at 09:30
1

@UriAgassi Good, I was about to post an answer myself using lookahead and lookbehind, but now I'll give you +1 ;) Btw there is still a typo, you need `}` after `(translation)` to close the `#{` in both your code blocks. – Daniël Knippers May 09 '14 at 09:34
@UriAgassi Please note that `!\w` is not suitable for cases like this: `"b.a. bbb".gsub(/(?<!\w)#{Regexp.escape("a.")}(?!\w)/, "ccc") #=> "b.ccc bbb"` I don't think this effect is wanted. Maybe is better to check for: whitespace || start / end of line – mdesantis May 09 '14 at 09:41
Something like this: `"a. b.a. a. bbb".gsub(/(?<=^|\s)#{Regexp.escape("a.")}(?=\s|$)/, "ccc") #=> "ccc b.a. ccc bbb"` user2239655 what do you think about it? – mdesantis May 09 '14 at 09:44
@mdesantis It's a valid suggestion, it depends on what the OP wants and on the definition of "full word" I suppose. Perhaps just whitespace (or start/end of string) is indeed a better delimiter than any of the `\W` characters. Although note that `^` is the beginning of a *line*, whereas `\A` is the start of the *string*. Likewise for `$` and `\z`, see [the docs](http://www.ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Anchors). – Daniël Knippers May 09 '14 at 10:03
@DaniëlKnippers I'm aware of `\A\z`/`^$` difference, I wrote start/ned of _line_ indeed. As you wrote, it depends by the OP needs; my reasoning is motivated by the fact that OP is considering dots as part of a word, and `\w` excludes dots. – mdesantis May 09 '14 at 10:07
1

@mdesantis I must have read your comment as 'start/end of string', my bad. We agree then :). – Daniël Knippers May 09 '14 at 10:10

score 1 · Answer 2 · answered May 09 '14 at 10:52

Since \w excludes dots, which I guess OP wants to include between token characters, I propose a whitelist lookarounds approach:

string = "a. b.a. a. bbb"
translation = "a."

# Using !\w b.a. is not considered as a single token
string.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# Notice b.ccc
#=> "ccc b.ccc ccc bbb"

# Using \s b.a. is considered as a single token
string.gsub(/(?<=^|\s)#{Regexp.escape(translation)}(?=\s|$)/, "ccc")
# Notice b.a.
#=> "ccc b.a. ccc bbb"

Anyway, the rightness of my reasoning depends by OP needs ;-)

score 0 · Answer 3 · answered May 09 '14 at 09:15

0

The . (dot) has a special meaning in regexes: it means match any character.

You should escape it with \.

answered May 09 '14 at 09:15

Kostas Rousis

5,918
1
33
38

While your statements are true, this alone does not solve the problems with the replacement due to the behavior of the word boundary `\b` in the Regexp. See the discussion at Uri Agassi's answer. – Daniël Knippers May 09 '14 at 09:39

Ruby replace word

3 Answers3