1

I have a form with a text editor(quill) where I am applying $sanitize to secure any code injection.

And after I am applying the following regex

string_regex = /[\n\r,.:?!()\]\[]|<\/?[^>]+(>|$)/

It removes all the html tags(e.g.: style of the html: Bold, Italic) and other especial characters

But I have some problems because

  • I want also receive html code as text, so I would like to maintain "(" "[", etc..

  • At the same time I want to remove these characters of "normal" words. E.g: Michael; (John) Brian! => Michael, John, Brian

  • I want to receive some characters with accents (é, ó, ú, etc...) that are codified by the sanitize as "& #2 3 3 ;" , so I need the " ; ". For these I am using HtmlDecode to show in the view.

There is any easy way to specify the regex for that?

Or at least maintain the ";" for words that starts with "&" and remove for the others

EDIT

As example if I have the string below:

&lt;script&gt;alert()&lt;/script&gt;  <p>wow</p> voc&#233; John; Michael!

And I want to receive

&lt;script&gt;alert()&lt;/script&gt; wow voc&#233; John Michael
Matheus Oliveira
  • 587
  • 3
  • 10
  • 33
  • 2
    Here you go `(?i)(?:&(?:[a-z_:][a-z\d_:.-]*|(?:\#(?:[0-9]+|x[0-9a-f]+)))|%[a-z_:][a-z\d_:.-]*);` –  Jul 05 '17 at 18:18
  • 1
    You have some problems, because you are not clear about what you want. You want to keep parentheses, but you want remove them. You want to keep Javascript, but you want to remove `;`. But there are even more contradictions. – Lorenz Meyer Jul 05 '17 at 18:45
  • I want to identify if the ';' is due to an accent (with the ) or due to a normal text – Matheus Oliveira Jul 05 '17 at 19:03
  • [This](https://stackoverflow.com/a/1732454/1575353) might be of interest here... – Sᴀᴍ Onᴇᴌᴀ Oct 27 '17 at 16:08

0 Answers0