Regular expression for matching words between
&

Question

Basically I want to strip the document of words between blockquotes. I'm a regular expression newb and even after using rubular, I'm no closer to the answer.

Any help is appreciated.

score 10 · Accepted Answer · edited May 23 '17 at 12:03

10

Use an HTML parser and forget regular expressions. Regex is incapable of correctly handling HTML.

doc = Nokogiri::HTML(your_html)
doc.xpath("//blockquote").remove

From: Strip text from HTML document using Ruby

There are more examples of how to use Nokogiri and XPath, if you look around.

edited May 23 '17 at 12:03

Community

1
1

answered Apr 19 '10 at 07:44

Tomalak

332,285
67
532
628

score 0 · Answer 2 · answered Apr 19 '10 at 07:58

0

raw example:

/<blockquote>([^<]*)<\/blockquote>/

answered Apr 19 '10 at 07:58

Oleg Razgulyaev

5,757
4
28
28

4

This fails for `
Some bold text
`. As I said: Regex is *technically incapable* of correctly handling HTML. – Tomalak Apr 19 '10 at 08:02

Paul · Answer 3 · 2010-04-19T17:11:24.707

0

Sample string:

<blockquote>Hello world</blockquote>

type the following regex in rubular <blockquote>(.+?)</blockquote>

or for something more generic:

<.*?>(.+?)</.*?>

hope it helps!

edited Apr 19 '10 at 17:11

answered Apr 19 '10 at 08:02

Paul

171
2
10

This fails for `
Some
quoted text
within a quote.
`. – Tomalak Apr 19 '10 at 12:16
if we are just talking ruby: resultarray = htmlstring.split(/<.*?>/). The split() method will disregard the regex match and the text between the matches is kept. FYI: the scan() method will perform the opposite of this. if you're a newb, i suggest to spend some time learning regexs, it's pretty language agnostic and will serve you well. – Paul Apr 19 '10 at 17:28
If this comment was for me: No, I'm not a "newb" as far as regular expressions go. ;) And `htmlstring.split(/<.*?>/)` fails for `Don't do HTML with RegEx`. – Tomalak Apr 19 '10 at 18:56

Regular expression for matching words between &

3 Answers3

Regular expression for matching words between
&