113

I'm looking for a way to perform a regex match on a string in Ruby and have it short-circuit on the first match.

The string I'm processing is long and from what it looks like the standard way (match method) would process the whole thing, collect each match, and return a MatchData object containing all matches.

match = string.match(/regex/)[0].to_s
Daniel Beardsley
  • 19,907
  • 21
  • 66
  • 79

5 Answers5

160

You could try String#[] (as in variableName[/regular expression/]).

This is an example output from IRB:

names = "erik kalle johan anders erik kalle johan anders"
# => "erik kalle johan anders erik kalle johan anders"
names[/kalle/]
# => "kalle"
Sebastián Palma
  • 32,692
  • 6
  • 40
  • 59
Presidenten
  • 6,327
  • 11
  • 45
  • 55
  • Is this not doing a match and returning the first result behind the scenes ? – Gishu Feb 06 '09 at 10:33
  • 7
    After some benchmarking with various length strings and looking at the C source, it turns out Regex.match does short-circuit and only finds the first match. – Daniel Beardsley Feb 06 '09 at 12:17
  • 3
    Neat, didn't know about this shortcut. – Pierre Nov 21 '12 at 16:53
  • 1
    Is there some documentation on this shortcut? I searched high and low for what I thought was a relatively simple task and only solved my issue after finding this. Thanks! – dmourati Jun 19 '13 at 00:38
  • This work well but if you have no matches, it returns nil instead of an empty string. – andoke Aug 19 '13 at 13:38
  • 6
    @dmourati You can find this feature documented in [String#\[\]](http://www.ruby-doc.org/core-2.1.2/String.html#method-i-5B-5D). Thanks for asking about the doc, because in reading it I found the `capture` argument – which lets you return a capture instead of the full match. – slothbear Jul 11 '14 at 13:02
86

You can use []: (which is like match)

"foo+account2@gmail.com"[/\+([^@]+)/, 1] # matches capture group 1, i.e. what is inside ()
# => "account2"
"foo+account2@gmail.com"[/\+([^@]+)/]    # matches capture group 0, i.e. the whole match
# => "+account2"
Christopher Oezbek
  • 23,994
  • 6
  • 61
  • 85
Benjamin Crouzier
  • 40,265
  • 44
  • 171
  • 236
26

If only an existence of a match is important, you can go with

/regexp/ =~ "string"

Either way, match should only return the first hit, while scan searches throughout entire string. Therefore if

matchData = "string string".match(/string/)
matchData[0]    # => "string"
matchData[1]    # => nil - it's the first capture group not a second match
Slartibartfast
  • 8,735
  • 6
  • 41
  • 45
13

I am not yet sure whether this feature is awesome or just totally crazy, but your regex can define local variables.

/\$(?<dollars>\d+)\.(?<cents>\d+)/ =~ "$3.67" #=> 0
dollars #=> "3"

(Taken from http://ruby-doc.org/core-2.1.1/Regexp.html).

Felix
  • 4,510
  • 2
  • 31
  • 46
2

A Regular Expression (regex) is nothing but a finite state machine (FSM).

An FSM attempts to answer the question "Is this state possible or not?"

It keeps attempting to make a pattern match until a match is found (success), or until all paths are explored and no match was found (failure).

On success, the question "Is this state possible or not?" has been answered with a "yes". Hence no further matching is necessary and the regex returns.

See this and this for more on this.

Further: here is an interesting example to demonstrate how regex works. Here, a regex is used to detect if a give number is prime. This example is in perl, but it can as well be written in ruby.

Community
  • 1
  • 1
Litmus
  • 10,558
  • 6
  • 29
  • 44