Regex (Ruby) to capture object name and id number

Question

For example I have two potential markups:

<iframe src="http://embed.app.com/packages/495" width="850" height="480" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>

<iframe src="https://embed.app.com/videos/10332?hide_text=1&amp;buy_btn=0&amp;autoplay=0" width="960" height="540" class="responsive-embed"></iframe>

And I'm looking to use Ruby match whether it's for a video or a package as well as its :id number

Any assistance greatly appreciated!

Only use regex once you have used an XML parser. Otherwise http://stackoverflow.com/a/1732454/1916721 — carloabelli, Sep 25 '14 at 04:04
@chhhris you mean this http://www.rubular.com/r/XFqMeXdWZv ? — Avinash Raj, Sep 25 '14 at 04:32

score 2 · Answer 1 · answered Sep 25 '14 at 04:28

2

^.*?\/(?=packages|videos).*?\/(\d+)

Try this.see demo.

http://regex101.com/r/qC9cH4/1

answered Sep 25 '14 at 04:28

vks

67,027
10
91
124

kudos so far, that matches the ID numbers... if I didn't know which html I was parsing, how would I match if it's a `Video` or a `Package`? Sorry I'm horrible at regex and on a deadline. Thanks! – chhhris Sep 25 '14 at 04:40
@chhhris just add them as well.`(?=packages|videos|Video|Package)` and you are ready to roll – vks Sep 25 '14 at 04:41
thanks @vks, to clarify, i meant between the two example markups, how can I get the match value to equal either `packages` or `videos` instead of the `:id`. For example I got this working: http://rubular.com/r/HJ6TxYpOEO – chhhris Sep 25 '14 at 04:49
@chhhris didnt get you.You already have the match of `video` and `package`.you want both `video` `id`? – vks Sep 25 '14 at 04:53
@chhhris Try `^.*?\/(?=packages?|videos?).*?\/(\d+)` – Sep 25 '14 at 06:03

Casimir et Hippolyte · Answer 2 · 2014-09-25T04:38:48.430

An example with nokogiri to find the src attributes in iframe tags and a pattern to extract informations:

require 'nokogiri'

html_doc = <<EOD
<iframe src="http://embed.app.com/packages/495" width="850" height="480" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>
<iframe src="https://embed.app.com/videos/10332?hide_text=1&amp;buy_btn=0&amp;autoplay=0" width="960" height="540" class="responsive-embed"></iframe>
EOD

puts "Type         ID\n----------------------"
doc = Nokogiri::HTML.parse(html_doc)
srcList = doc.xpath('//iframe/@src')
srcList.each do |src| 
    if ( m = src.to_s.match(/\/(?<type>packages|videos)\/(?<id>[0-9]+)/) )
        printf("%-12s %s\n",m[:type], m[:id])
    end
end

score 2 · Accepted Answer · answered Sep 25 '14 at 04:54

2

Ruby 2.0 supports \K. So you could use the below regex,

<iframe src="https?:\/\/[^\s]*?\/\K(?:videos|packages)\/\d+

DEMO

OR

If you don't want to match the id's then use this,

<iframe src="https?:\/\/[^\s]*?\/\K(?:videos|packages)

DEMO

OR

This would capture the both in two separate groups.

<iframe src="https?:\/\/[^\s]*?\/\K(videos|packages)\/(\d+)

DEMO

answered Sep 25 '14 at 04:54

Avinash Raj

172,303
28
230
274

That last one is exactly what I was looking for! – chhhris Sep 25 '14 at 05:13

score 0 · Answer 4 · edited May 23 '17 at 12:13

0

Only use regex once you have used an XML parser. Otherwise https://stackoverflow.com/a/1732454/1916721.

Once you get just the src attribute you can parse the link with this quick regex:

https?:\/\/embed\.app\.com\/((?:packages)|(?:videos))\/([0-9]+)

You will then get either packages or videos in the 1st capture group (you can trim the s as you please. In the 2nd capture group you will get the id.

For an example see here: http://regex101.com/r/uF4bI1/2

edited May 23 '17 at 12:13

Community

1
1

answered Sep 25 '14 at 04:23

carloabelli

4,289
3
43
70

If you're already employing an XML parser then why not throw in URI or Addressable to parse the URL? – mu is too short Sep 25 '14 at 04:24
@muistooshort They asked for regex so I gave them regex. That would probably be the better option though good point – carloabelli Sep 25 '14 at 04:26

score 0 · Answer 5 · answered Sep 25 '14 at 04:32

0

in ruby, Regex is

/iframe src="http:\/\/[^\/]+\/[packages|videos]+\/([^"]+)"/

answered Sep 25 '14 at 04:32

han058

908
8
19

Regex (Ruby) to capture object name and id number

5 Answers5