0

I just want to match & in the url but not the xml entities like &< etc.

<a href="/test/test2">Contact Us</a>
<a href="http://www.testassociation.com/test.html?ab=5&cd=5&ab=c" target="_blank">Customer Association</a>&amp;

http://www.testassociation.com/test.html?ab=5&cd=5&ab=c

I want to replace the & with &amp; but not disturb the other entities.

Sorry I am not getting idea how to do it.

I tried this:

(&)([a-z][^;]*)

Is there a better way.

mpapec
  • 50,217
  • 8
  • 67
  • 127
Susheel Singh
  • 3,824
  • 5
  • 31
  • 66

2 Answers2

1
(?!&amp|&lt)&

You can use something like this.You will have to list all &amp like words you want to miss.I have listed two.

See demo.

http://regex101.com/r/tA9uG5/1

Edit

&(?=\w\w=)

use this if you dont want to list all.

vks
  • 67,027
  • 10
  • 91
  • 124
1

The only way to be completely accurate is like @vks says including all the list of entities.

You can find this list in the wikipedia: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

If you don't need to be so accurate, and having the longest entity &thetasym; with 8 characters you can use negative lookahead:

(?!&\w{1,8};)&

Demo

Taking in mind that you will also miss everything with the form &dffa; even if it is not a valid entity

Oscar Hermosilla
  • 480
  • 5
  • 21