Ruby gsub / regex with several arguments

Question

I'm new to ruby and I'm trying to solve a problem.

I'm parsing through several text field where I want to remove the header which has different values. It works fine when the header always is the same:

variable = variable.gsub(/(^Header_1:$)/, '')

But when I put in several arguments it doesn't work:

variable = variable.gsub(/(^Header_1$)/ || /(^Header_2$)/ || /(^Header_3$)/ || /(^Header_4$)/ || /^:$/, '')

score 3 · Accepted Answer · 2012-11-26T17:15:23.627

You can use Regexp.union:

regex = Regexp.union(
  /^Header_1/,
  /^Header_2/,
  /^Header_3/,
  /^Header_4/,
  /^:$/
)
variable.gsub(regex, '')

Please note that ^something$ will not work on strings containing something more than something :)

Cause ^ is for matching beginning of string and $ is for end of string.

So i intentionally removed $.

Also you do not need brackets when you only need to remove the matched string.

You can also use it like this:

headers = %w[Header_1 Header_2 Header_3]
regex = Regexp.union(*headers.map{|s| /^#{s}/}, /^\:$/, /etc/)
variable.gsub(regex, '')

And of course you can remove headers without explicitly define them.

Most likely there are a white space after headers?

If so, you can do it as simple as:

variable = "Header_1 something else"
puts variable.gsub(/(^Header[^\s]*)?(.*)/, '\2')
#=>  something else

variable = "Header_BLAH something else"
puts variable.gsub(/(^Header[^\s]*)?(.*)/, '\2')
#=>  something else

score 2 · Answer 2 · answered Nov 26 '12 at 16:48

2

Just use a proper regexp:

variable.gsub(/^(Header_1|Header_2|Header_3|Header_4|:)$/, '')

answered Nov 26 '12 at 16:48

Jean-Louis Giordano

1,957
16
18

the Tin Man · Answer 3 · 2012-11-26T18:09:02.207

If the header is always the same format of Header_n, where n is some integer value, then you can simplify your regex greatly:

/Header_\d+/

will find every one of these:

%w[Header_1 Header_2 Header_3].grep(/Header_\d+/)

[
    [0] "Header_1",
    [1] "Header_2",
    [2] "Header_3"
]

Tweaking it to handle finding words, not substrings:

/^Header_\d+$/

or:

/\bHeader_\d+\b/

As mentioned, using Regexp.union is a good start, but, used blindly, can result in very slow or inefficient patterns, so think ahead and help out the engine by giving it useful sub-patterns to work with:

values = %w[foo bar]
/Header_(?:\d+|#{ values.join('|') })/
=> /Header_(?:\d+|foo|bar)/

Unfortunately, Ruby doesn't have the equivalent to Perl's Regexp::Assemble module, which can build highly optimized patterns from big lists of words. Search here on Stack Overflow for examples of what it can do. For instance:

use Regexp::Assemble;

my @values = ('Header_1', 'Header_2', 'foo', 'bar', 'Header_3');
my $ra = Regexp::Assemble->new;
foreach (@values) {
    $ra->add($_);
}
print $ra->re, "\n";
=> (?-xism:(?:Header_[123]|bar|foo))

Ruby gsub / regex with several arguments

3 Answers3

Linked