-1

I have a string can contain html like this:

 Hello my name is <a href='...'>felipe</a> and I've one brother

I need to scape only the quotes that are outside the html. So the result would be

 Hello my name is <a href='...'>felipe</a> and I\'ve one brother

Any ideas? Gems?

HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
Arnold Roa
  • 7,335
  • 5
  • 50
  • 69

3 Answers3

0

This will be hard to do if you are given a string of "free text HTML", as one would basically have to parse it.

Most templating languages do provide some way to distinguish between "raw HTML" and "String contents to be escaped". That is usually the place where you should solve this problem, so what are you using for your templates?

Patru
  • 4,481
  • 2
  • 32
  • 42
0

Here's a basic regex you can use to match single quotes that occur outside of the html element tags...I haven't tested it thoroughly, but it matches your input string and also a few other variations that I tried.

'(?![^<]*>)

It's basically only matching single quotes that aren't followed by a '>' unless there is a '<' in front of the '>'

Here's what it would look like with the substitution:

your_string.gsub(/'(?![^<]*>)/, "\\\\'")

You have to use all of the backslashes in the replacement string because of how gsub parses the replacement string.

Jonathan Nye
  • 78
  • 2
  • 6
-1

With this question I learned that is not possible to parse html strings with regex. Ruby has really good html parsers like Nokogiri, the one I used to solve my problem

This questions has a really good explaination about this.

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Arnold Roa
  • 7,335
  • 5
  • 50
  • 69