How to access the various occurences of the same match group in Ruby Regular expressions ?

Question

I have a regular expression which has multiple matches. I figured out that $1 , $2 etc .. can be used to access the matched groups. But how to access the multiple occurences of the same matched group ?

Please take a look at the rubular page below.

http://rubular.com/r/nqHP1qAqRY

So now $1 gives 916 and $2 gives NIL. How can i access the 229885 ? Is there something similar to $1[1] or so ?

How, specifically, do you want to use the matches? For example, `xml.grep(/ — Dave Newton, Aug 21 '12 at 19:05
That worked. Thanks! And I am not gonna do more complex operations on this XML. Thats why didnt go for an XML Parser. But could you explain what |d| means in your code ? I basically need to store those numbers in an array and read them back when needed. — Pradep, Aug 21 '12 at 20:42

robustus · Answer 1 · 2012-08-21T21:33:54.153

Firstly it is not a good idea to parse xml-based data only with regular expressions. Instead use a library for parsing xml-files, like nokogiri.

But if you're sure, that you want to use this approach, you do need to know the following. Regex engines stop as soon as they get a (pleasing) match. So you cannot expect to get all possible matches in a string from one regex-call, you need to iterate through the string applying a new regex-match after each already occurred match. You could do it like that:

# ruby 1.9.x version
regex = /<DATA size="(\d+)"/
str = your_string # Your string to be parsed
position = 0
matches = []
while(match = regex.match(str,position)) do # Until there are no matches anymore
  position = match.end 0 # set position to the end of the last match
  matches << match[1] # add the matched number to the matches-array
end

After this all your parsed numbers should be in matches.

But since your comment suggests, that you are using ruby 1.8.x i will post another version here, which works in 1.8.x (the method definition are different in these versions).

# ruby 1.8.x version
regex = /<DATA size="(\d+)"/
str = your_string # Your string to be parsed
matches = []
while(match = regex.match(str)) do # Until there are no matches anymore
  str = match.post_match # set str to the part which is after the match.
  matches << match[1] # add the matched number to the matches-array
end

Hey, i tried this code. I am getting the following error . in line 5 ``match': wrong number of arguments (2 for 1) (ArgumentError)` match takes in two arguments right ? — Pradep, Aug 21 '12 at 20:34

score 1 · Accepted Answer · answered Aug 21 '12 at 21:02

To expand on my comment and respond to your question:

If you want to store the values in an array, modify the block and collect instead of iterate:

> arr = xml.grep(/<DATA size="(\d+)"/).collect { |d| d.match /\d+/ }
> arr.each { |a| puts "==> #{a}" }
==> 916
==> 229885

The |d| is normal Ruby block parameter syntax; each d is the matching string, from which the number is extracted. It's not the cleanest Ruby, although it's functional.

I still recommend using a parser; note that the rexml version would be this (more or less):

require 'rexml/document'
include REXML
doc = Document.new xml
arr = doc.elements.collect("//DATA") { |d| d.attributes["size"] }
arr.each { |a| puts "==> #{a}" }

Once your "XML" is converted to actual XML you can get even more useful data:

doc = Document.new xml
arr = doc.elements.collect("//file") do |f|
  name = f.elements["FILENAME"].attributes["path"]
  size = f.elements["DATA"].attributes["size"]
  [name, size]
end

arr.each { |a| puts "#{a[0]}\t#{a[1]}" }

~/Users/1.txt   916
~/Users/2.txt   229885

score 0 · Answer 3 · edited May 23 '17 at 10:34

0

This is not possible in most implementations of regex. (AFAIK only .NET can do this.)

You will have to use an alternate solution, e.g. using scan(): Equivalent to Python’s findall() method in Ruby?.

edited May 23 '17 at 10:34

Community

1
1

answered Aug 21 '12 at 18:58

Andrew Cheong

29,362
15
90
145

How to access the various occurences of the same match group in Ruby Regular expressions ?

3 Answers3