0

I need a REGEX pattern that will transform a raw HTML block in the following way:

<!DOCTYPE html>
...
<anytag>
   <h1>This is the less than < symbol. </h1>
   <h1>This is the less than or equal to <= symbol. </h1>
</anytag>

Transforms to:

&lt;!DOCTYPE html>
...
&lt;anytag>
   &lt;h1>This is the less than < symbol. &lt;/h1>
   &lt;h1>This is the less than or equal to <= symbol. &lt;/h1>
&lt;/anytag>

So that the < character in only HTML tags is replaced, and nowhere else.

This is to solve an issue with syntax highlighting of html with prism.js here:

highlighting html with prism.js

Thanks Washington Guedes

Working

Community
  • 1
  • 1
Steve Adams
  • 264
  • 1
  • 11
  • just pseudo code, it should work on any html tag pattern – Steve Adams Jul 28 '15 at 16:53
  • 3
    This seems like an XY problem. I don't see why you don't just HTML escape all entities. – zzzzBov Jul 28 '15 at 16:53
  • because i'm trying to use prism.js to syntax highlight a raw block of html code and only the < on html tags needs to be encoded. If i encode all of the entities then prism breaks. :) – Steve Adams Jul 28 '15 at 16:55
  • Thanks for the advice, but I specifically need a regex pattern to do this. – Steve Adams Jul 28 '15 at 16:56
  • 2
    In well-formed HTML, all left angle brackets which are not the start of a tag should already be encoded with entities. – tripleee Jul 28 '15 at 17:01
  • "If i encode all of the entities then prism breaks." then file a bug report and use a better syntax highlighter. – zzzzBov Jul 28 '15 at 18:17

2 Answers2

2

Give this a try:

<(?![^>]*<)

Regex live here.

-2

Here are some common entities. You do not need to use the full code - there are common aliases for frequently used entities. For example, you can use < and > to indicate less than and greater than symbols. & is ampersand, etc.

EDIT: That should be - &lt &gt and &amp