How do get value tag in Ruby using regex?

Question

I have the tag:

val = "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web</a>"

In my test:

val[/(>.*<)/]

The return:

>Mobile Web<

I want return the text:

Mobile Web

Rule one, [don't use regular expressions to parse HTML or XML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). While it's tempting, it's also extremely error-prone and fragile. Instead, use a real parser. It's quite easy, more stable and less fragile. @Blender gave you the right answer. — the Tin Man, Apr 23 '13 at 05:47

score 7 · Answer 1 · answered Apr 23 '13 at 02:16

7

You can parse it with Nokogiri:

require 'nokogiri'

html = '<a href="https://mobile.twitter.com" rel="nofollow">Mobile Web</a>'
elem = Nokogiri(html)

puts elem.text

answered Apr 23 '13 at 02:16

Blender

289,723
53
439
496

2

This is a much better answer than the accepted one. Regexp is the wrong tool for parsing html. – dbenhur Apr 23 '13 at 05:11
It's a shame the OP didn't give a real world-example of HTML; As is the true strength of a parser like Nokogiri isn't apparent, nor is the failings of using a regex. We have no idea what machinations were gone through to extract that line, but for real world use it'd be easy with a real parser. – the Tin Man Apr 23 '13 at 05:55
Nokogiri is great! In big application, this less performance! I prefer regex! – Darlan Dieterich Apr 23 '13 at 18:39
@DarlanDieterich: Benchmark it and tell me how much slower a HTML parser is. I really, really doubt your use case calls for a regex solution. – Blender Apr 23 '13 at 22:39

anthonybell · Answer 2 · 2013-04-23T03:48:56.633

2

you can use match and select the parts you want with the parenthesis

/>(.*)</.match(val)[1]

I would use a html parsing library like hpricot or nokogiri for html parsing though because there can be a lot of corner cases with regex that aren't apparent until after it's running in production somewhere for months and breaks!

edited Apr 23 '13 at 03:48

answered Apr 23 '13 at 03:43

anthonybell

5,790
7
42
60

score 0 · Accepted Answer · answered Apr 23 '13 at 02:15

0

A lookahead/lookbehind will work.

val[/(?<=>)(.*)(?=<)/]

answered Apr 23 '13 at 02:15

Explosion Pills

188,624
52
326
405

score 0 · Answer 4 · answered Apr 23 '13 at 05:14

0

require 'nokogiri'

html = '<a href="https://mobile.twitter.com" rel="nofollow">Mobile Web</a>'
elem = Nokogiri::HTML::DocumentFragment.parse(html).child

p elem.text #=> Mobile Web

answered Apr 23 '13 at 05:14

Arup Rakshit

116,827
30
260
317

How do get value tag in Ruby using regex?

4 Answers4