How to escape code in `tag with php htmlentities, even if tag has attributes`

Question

So someone actually posted a fantastic solution here How can I escape all code within <code></code> tags to allow people to post code?

The problem is that this only works if it's <code></code>. However, this breaks with <code id="lol"></code for example, since it contains an attribute. How can I account for this, in order to strictly escape strings inside the code tag, whether or not it has any attributes.

I apologize if there is an obvious solution to this. Regexes give me nightmares.

Edit

As I explained in the question initially, the post that is supposedly a duplicate does not account for the <code> tag with something like a class or any other attributes.

the answer is already there inside http://stackoverflow.com/a/9509581/3859027 — Kevin, Aug 30 '16 at 04:23
Possible duplicate of [How can I escape all code within tags to allow people to post code?](http://stackoverflow.com/questions/9509447/how-can-i-escape-all-code-within-code-code-tags-to-allow-people-to-post-code) — Nicolas Henneaux, Aug 30 '16 at 04:26
Are you joking me Nicolas? I linked to that question, but it is not the right answer. I'm honestly not sure if you're trolling me. — D-Marc, Aug 30 '16 at 04:31
[This isn't something I'd recommend solving with regexes](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Might I recommend getting a nice HTML parser instead? — Sebastian Lenartowicz, Aug 30 '16 at 04:49
I just need it for a very simple task. I also fully control the data, so I don't have any xss worries. — D-Marc, Aug 30 '16 at 04:55

Sebastian Lenartowicz · Accepted Answer · 2016-08-30T05:12:42.087

1

In spite of my comment above, I'll endeavour to provide a regex for you to use. I would, however, emphatically not recommend doing this with regexes, but using an HTML parser instead.

Your regex should look a bit like this:

<\s*code(.*?)>(.+?)<\s*\/code\s*>

To break it down a bit,

\s* matches zero or more whitespace characters.

code matches the literal string "code".

.*? is a lazy match of zero or more characters. It will match everything (if anything) up to the end of the tag.

(.+?) is a capture group, containing a lazy match of one or more characters. If nothing else, your <code> tags will never be completely blank (as there must be at least one character between them).

And, finally, <\s*\/code\s*> matches the ending tag, with the possibility of included whitespace. Note that the slash (/) character is escaped, as it has to be in just about every regex flavour under the sun.

edited Aug 30 '16 at 05:12

answered Aug 30 '16 at 04:54

Sebastian Lenartowicz

4,695
4
28
39

1

Did the trick, boss. And I do agree with you. Every other project I would use a parser. I just wanted a simple solution for this case, even though it's technically not "correct" – D-Marc Aug 30 '16 at 05:04
One small detail, Sebastion. How can my replacement string retain the attribute value(s)? – D-Marc Aug 30 '16 at 05:12
1

@lazyboy78: Edited. I added another capture group to the opening tag that'll grab any attributes it has. – Sebastian Lenartowicz Aug 30 '16 at 05:13

How to escape code in tag with php htmlentities, even if tag has attributes

1 Answers1

How to escape code in `tag with php htmlentities, even if tag has attributes`