2

I have a string:

story = 'A long foo ago, in a foo bar baz, baz away...foobar'

I also have matches from this string (the dictionary is dynamic, it doesn't depend on me)

string_matches = ['foo', 'foo', 'bar', 'baz', 'baz', 'foobar'] # words can be repeated

How to replace each match with **foo**? to get a result:

story = 'A long **foo** ago, in a **foo** **bar** **baz**, **baz** away...**foobar**'

for example my code:

string_matches.each do |word|
  story.gsub!(/#{word}/, "**#{word}**")
end

returned:

"A long ****foo**** ago, in a ****foo**** **bar** ****baz****, ****baz**** away...****foo******bar**"
evans
  • 549
  • 7
  • 22

2 Answers2

4

If you need to check if the words are matched as whole words, you may use

story.gsub(/\b(?:#{Regexp.union(string_matches.uniq.sort { |a,b| b.length <=> a.length }).source})\b/, '**\0**')

If the whole word check is not necessary use

story.gsub(Regexp.union(string_matches.uniq.sort { |a,b| b.length <=> a.length }), '**\0**')

See the Ruby demo

Details

  • \b - a word boundary
  • (?:#{Regexp.union(string_matches.uniq.sort { |a,b| b.length <=> a.length }).source}) - this creates a pattern like (?:foobar|foo|bar|baz) that matches a single word from the deduplicated list of keywords, and sorts them by length in the descending order. See Order of regular expression operator (..|.. ... ..|..) why this is necessary.
  • \b - a word boundary

The \0 in the replacement pattern is the replacement backreference referring to the whole match.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 2
    The interpolation on the 2nd one is superfluous. You can just use `story.gsub(Regexp.union(...), '**\0**')` – Stefan Mar 31 '20 at 13:46
  • I would also use `.sort_by(&:length).reverse` instead of `.sort { |a, b| b.length <=> a.length }`. Which is in my opinion more clean and expressive. It is just personal preference and I'd understand if you leave the answer as is. – 3limin4t0r Mar 31 '20 at 13:58
0

A slight change will nearly get you there:

irb(main):001:0> string_matches.uniq.each { |word| story.gsub!(/#{word}/, "**#{word}**") }
=> ["foo", "bar", "baz", "foobar"]
irb(main):002:0> story
=> "A long **foo** ago, in a **foo** **bar** **baz**, **baz** away...**foo****bar**"

The trouble with the final part of the resulting string is that foobar has been matched by both foo and foobar.

Keith Pitty
  • 1,498
  • 1
  • 12
  • 22