-1

I have the tag:

val = "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web</a>"

In my test:

val[/(>.*<)/]

The return:

>Mobile Web<

I want return the text:

Mobile Web
Darlan Dieterich
  • 2,369
  • 1
  • 27
  • 37
  • 1
    Rule one, [don't use regular expressions to parse HTML or XML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). While it's tempting, it's also extremely error-prone and fragile. Instead, use a real parser. It's quite easy, more stable and less fragile. @Blender gave you the right answer. – the Tin Man Apr 23 '13 at 05:47

4 Answers4

7

You can parse it with Nokogiri:

require 'nokogiri'

html = '<a href="https://mobile.twitter.com" rel="nofollow">Mobile Web</a>'
elem = Nokogiri(html)

puts elem.text
Blender
  • 289,723
  • 53
  • 439
  • 496
  • 2
    This is a much better answer than the accepted one. Regexp is the wrong tool for parsing html. – dbenhur Apr 23 '13 at 05:11
  • It's a shame the OP didn't give a real world-example of HTML; As is the true strength of a parser like Nokogiri isn't apparent, nor is the failings of using a regex. We have no idea what machinations were gone through to extract that line, but for real world use it'd be easy with a real parser. – the Tin Man Apr 23 '13 at 05:55
  • Nokogiri is great! In big application, this less performance! I prefer regex! – Darlan Dieterich Apr 23 '13 at 18:39
  • @DarlanDieterich: Benchmark it and tell me how much slower a HTML parser is. I really, really doubt your use case calls for a regex solution. – Blender Apr 23 '13 at 22:39
2

you can use match and select the parts you want with the parenthesis

/>(.*)</.match(val)[1]

I would use a html parsing library like hpricot or nokogiri for html parsing though because there can be a lot of corner cases with regex that aren't apparent until after it's running in production somewhere for months and breaks!

anthonybell
  • 5,790
  • 7
  • 42
  • 60
0

A lookahead/lookbehind will work.

val[/(?<=>)(.*)(?=<)/]
Explosion Pills
  • 188,624
  • 52
  • 326
  • 405
0
require 'nokogiri'

html = '<a href="https://mobile.twitter.com" rel="nofollow">Mobile Web</a>'
elem = Nokogiri::HTML::DocumentFragment.parse(html).child

p elem.text #=> Mobile Web
Arup Rakshit
  • 116,827
  • 30
  • 260
  • 317