1

exapl I have specific situation. I am trying to replace some words in string. I have two example strings:

string1 = "aaabbb aaa bbb" 
string2 = "a. bbb"

In string1 I want to replace full word "aaa" with "ccc" so I do it right this:

translation = "aaa"
string1.gsub(/\b#{translation}\b/, "ccc") => "aaabbb ccc bbb"

So it work and I am happy but when I try to replace "a." with "aaa" It not work and It returns string2.

I tried also this:

translation = "a."
string2.gsub(translation, "aaa") => "aaa bbb"

But when I use above gsub for string1 I get "cccbbb ccc bbb". Sorry for ma English but I hope that I explained it a little understandable. Thanks for all answers.

user2239655
  • 830
  • 2
  • 11
  • 28

3 Answers3

2

Try

string1.gsub(/\b#{Regexp.escape(translation)}\b/, "ccc")

In regex '.' means "any character". by calling escape you are turning 'a.' to 'a\.' which means "a and then the period character".


Update
As @Daniel has noted in the comments, word boundaries have some subtleties. So for the above to work with "a." you need to replace the \b with look-aheads and look-behinds:

    string1.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
    # => "ccc bbb"
Community
  • 1
  • 1
Uri Agassi
  • 36,848
  • 14
  • 76
  • 93
  • Sure, but `"a. bbb".gsub("\b#{Regexp.escape("a.")}\b", "ccc")` does not replace `"a."` by `"ccc"` like it should. I think the `\b` does not match after `.` since `.` is not a word character. So this does not really answer the question of replacing `"a."`. – Daniël Knippers May 09 '14 at 09:16
  • @DaniëlKnippers updated my answer to work with such curious cases – Uri Agassi May 09 '14 at 09:30
  • 1
    @UriAgassi Good, I was about to post an answer myself using lookahead and lookbehind, but now I'll give you +1 ;) Btw there is still a typo, you need `}` after `(translation)` to close the `#{` in both your code blocks. – Daniël Knippers May 09 '14 at 09:34
  • @UriAgassi Please note that `!\w` is not suitable for cases like this: `"b.a. bbb".gsub(/(?<!\w)#{Regexp.escape("a.")}(?!\w)/, "ccc") #=> "b.ccc bbb"` I don't think this effect is wanted. Maybe is better to check for: whitespace || start / end of line – mdesantis May 09 '14 at 09:41
  • Something like this: `"a. b.a. a. bbb".gsub(/(?<=^|\s)#{Regexp.escape("a.")}(?=\s|$)/, "ccc") #=> "ccc b.a. ccc bbb"` user2239655 what do you think about it? – mdesantis May 09 '14 at 09:44
  • @mdesantis It's a valid suggestion, it depends on what the OP wants and on the definition of "full word" I suppose. Perhaps just whitespace (or start/end of string) is indeed a better delimiter than any of the `\W` characters. Although note that `^` is the beginning of a *line*, whereas `\A` is the start of the *string*. Likewise for `$` and `\z`, see [the docs](http://www.ruby-doc.org/core-2.1.1/Regexp.html#class-Regexp-label-Anchors). – Daniël Knippers May 09 '14 at 10:03
  • @DaniëlKnippers I'm aware of `\A\z`/`^$` difference, I wrote start/ned of _line_ indeed. As you wrote, it depends by the OP needs; my reasoning is motivated by the fact that OP is considering dots as part of a word, and `\w` excludes dots. – mdesantis May 09 '14 at 10:07
  • 1
    @mdesantis I must have read your comment as 'start/end of string', my bad. We agree then :). – Daniël Knippers May 09 '14 at 10:10
1

Since \w excludes dots, which I guess OP wants to include between token characters, I propose a whitelist lookarounds approach:

string = "a. b.a. a. bbb"
translation = "a."

# Using !\w b.a. is not considered as a single token
string.gsub(/(?<!\w)#{Regexp.escape(translation)}(?!\w)/, "ccc")
# Notice b.ccc
#=> "ccc b.ccc ccc bbb"

# Using \s b.a. is considered as a single token
string.gsub(/(?<=^|\s)#{Regexp.escape(translation)}(?=\s|$)/, "ccc")
# Notice b.a.
#=> "ccc b.a. ccc bbb"

Anyway, the rightness of my reasoning depends by OP needs ;-)

mdesantis
  • 8,257
  • 4
  • 31
  • 63
0

The . (dot) has a special meaning in regexes: it means match any character.

You should escape it with \.

Kostas Rousis
  • 5,918
  • 1
  • 33
  • 38
  • While your statements are true, this alone does not solve the problems with the replacement due to the behavior of the word boundary `\b` in the Regexp. See the discussion at Uri Agassi's answer. – Daniël Knippers May 09 '14 at 09:39