0

So someone actually posted a fantastic solution here How can I escape all code within <code></code> tags to allow people to post code?

The problem is that this only works if it's <code></code>. However, this breaks with <code id="lol"></code for example, since it contains an attribute. How can I account for this, in order to strictly escape strings inside the code tag, whether or not it has any attributes.

I apologize if there is an obvious solution to this. Regexes give me nightmares.

Edit

As I explained in the question initially, the post that is supposedly a duplicate does not account for the <code> tag with something like a class or any other attributes.

Community
  • 1
  • 1
D-Marc
  • 2,937
  • 5
  • 20
  • 28
  • the answer is already there inside http://stackoverflow.com/a/9509581/3859027 – Kevin Aug 30 '16 at 04:23
  • Possible duplicate of [How can I escape all code within tags to allow people to post code?](http://stackoverflow.com/questions/9509447/how-can-i-escape-all-code-within-code-code-tags-to-allow-people-to-post-code) – Nicolas Henneaux Aug 30 '16 at 04:26
  • Are you joking me Nicolas? I linked to that question, but it is not the right answer. I'm honestly not sure if you're trolling me. – D-Marc Aug 30 '16 at 04:31
  • [This isn't something I'd recommend solving with regexes](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Might I recommend getting a nice HTML parser instead? – Sebastian Lenartowicz Aug 30 '16 at 04:49
  • I just need it for a very simple task. I also fully control the data, so I don't have any xss worries. – D-Marc Aug 30 '16 at 04:55

1 Answers1

1

In spite of my comment above, I'll endeavour to provide a regex for you to use. I would, however, emphatically not recommend doing this with regexes, but using an HTML parser instead.

Your regex should look a bit like this:

<\s*code(.*?)>(.+?)<\s*\/code\s*>

To break it down a bit,

\s* matches zero or more whitespace characters.

code matches the literal string "code".

.*? is a lazy match of zero or more characters. It will match everything (if anything) up to the end of the tag.

(.+?) is a capture group, containing a lazy match of one or more characters. If nothing else, your <code> tags will never be completely blank (as there must be at least one character between them).

And, finally, <\s*\/code\s*> matches the ending tag, with the possibility of included whitespace. Note that the slash (/) character is escaped, as it has to be in just about every regex flavour under the sun.

Sebastian Lenartowicz
  • 4,695
  • 4
  • 28
  • 39
  • 1
    Did the trick, boss. And I do agree with you. Every other project I would use a parser. I just wanted a simple solution for this case, even though it's technically not "correct" – D-Marc Aug 30 '16 at 05:04
  • One small detail, Sebastion. How can my replacement string retain the attribute value(s)? – D-Marc Aug 30 '16 at 05:12
  • 1
    @lazyboy78: Edited. I added another capture group to the opening tag that'll grab any attributes it has. – Sebastian Lenartowicz Aug 30 '16 at 05:13