-1

For example I have two potential markups:

<iframe src="http://embed.app.com/packages/495" width="850" height="480" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>

<iframe src="https://embed.app.com/videos/10332?hide_text=1&amp;buy_btn=0&amp;autoplay=0" width="960" height="540" class="responsive-embed"></iframe>

And I'm looking to use Ruby match whether it's for a video or a package as well as its :id number

Any assistance greatly appreciated!

chhhris
  • 355
  • 1
  • 4
  • 16

5 Answers5

2
^.*?\/(?=packages|videos).*?\/(\d+)

Try this.see demo.

http://regex101.com/r/qC9cH4/1

vks
  • 67,027
  • 10
  • 91
  • 124
  • kudos so far, that matches the ID numbers... if I didn't know which html I was parsing, how would I match if it's a `Video` or a `Package`? Sorry I'm horrible at regex and on a deadline. Thanks! – chhhris Sep 25 '14 at 04:40
  • @chhhris just add them as well.`(?=packages|videos|Video|Package)` and you are ready to roll – vks Sep 25 '14 at 04:41
  • thanks @vks, to clarify, i meant between the two example markups, how can I get the match value to equal either `packages` or `videos` instead of the `:id`. For example I got this working: http://rubular.com/r/HJ6TxYpOEO – chhhris Sep 25 '14 at 04:49
  • @chhhris didnt get you.You already have the match of `video` and `package`.you want both `video` `id`? – vks Sep 25 '14 at 04:53
  • @chhhris Try `^.*?\/(?=packages?|videos?).*?\/(\d+)` –  Sep 25 '14 at 06:03
2

An example with nokogiri to find the src attributes in iframe tags and a pattern to extract informations:

require 'nokogiri'

html_doc = <<EOD
<iframe src="http://embed.app.com/packages/495" width="850" height="480" frameborder="0" webkitAllowFullScreen mozallowfullscreen allowFullScreen></iframe>
<iframe src="https://embed.app.com/videos/10332?hide_text=1&amp;buy_btn=0&amp;autoplay=0" width="960" height="540" class="responsive-embed"></iframe>
EOD

puts "Type         ID\n----------------------"
doc = Nokogiri::HTML.parse(html_doc)
srcList = doc.xpath('//iframe/@src')
srcList.each do |src| 
    if ( m = src.to_s.match(/\/(?<type>packages|videos)\/(?<id>[0-9]+)/) )
        printf("%-12s %s\n",m[:type], m[:id])
    end
end
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
2

Ruby 2.0 supports \K. So you could use the below regex,

<iframe src="https?:\/\/[^\s]*?\/\K(?:videos|packages)\/\d+

DEMO

OR

If you don't want to match the id's then use this,

<iframe src="https?:\/\/[^\s]*?\/\K(?:videos|packages)

DEMO

OR

This would capture the both in two separate groups.

<iframe src="https?:\/\/[^\s]*?\/\K(videos|packages)\/(\d+)

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Only use regex once you have used an XML parser. Otherwise https://stackoverflow.com/a/1732454/1916721.

Once you get just the src attribute you can parse the link with this quick regex:

https?:\/\/embed\.app\.com\/((?:packages)|(?:videos))\/([0-9]+)

You will then get either packages or videos in the 1st capture group (you can trim the s as you please. In the 2nd capture group you will get the id.

For an example see here: http://regex101.com/r/uF4bI1/2

Community
  • 1
  • 1
carloabelli
  • 4,289
  • 3
  • 43
  • 70
0

in ruby, Regex is

/iframe src="http:\/\/[^\/]+\/[packages|videos]+\/([^"]+)"/
han058
  • 908
  • 8
  • 19