34

Consider that I have a following string:

string s = "hello a & b, <hello world >"

I want to replace "&" (b/w a and b) with "&"

So, if I use

s.replace("&", "&");

It will also replace "&" associated with < and >.

Is there any way I can replace only "&" between a and b?

KP Joy
  • 525
  • 3
  • 9
  • 17
  • 5
    As with all cases of string encodings, you ought to know in what format the original string is to implement a correct function. For instance, is the original string intended to be a "raw string" or a "string with HTML entities"? The [answer by Karan](https://stackoverflow.com/a/63554951/603003) is only correct for the latter. – ComFreek Aug 24 '20 at 15:08
  • 10
    Karan's answer is a good one, but you should try to avoid this situation in the first place. How did you come to have a string that's a mix of html entities and unencoded & symbols? Could you have encoded the & before combining with the <? – bmm6o Aug 24 '20 at 16:11
  • 3
    @ComFreek: As with many of the encoding questions here on SO, I think the answer to your question is: "The original string is an arbitrary mix on HTML-encoded and non-HTML-encoded strings created by someone who doesn't know what they are doing, and I have been tasked to 'fix it' in a later stage. I know that this is impossible in the general case, so please provide a hack that works for most cases and I'll pray that the edge case which finally subtly breaks it only comes around after I have left the company". – Heinzi Aug 25 '20 at 10:46

5 Answers5

56

You can rather use HttpUtility.HtmlEncode & HttpUtility.HtmlDecode like below.

First decode your string to get normal string and then encode it again which will give you expected string.

HttpUtility.HtmlEncode(HttpUtility.HtmlDecode("hello a & b, &lt;hello world &gt;"));
  • HttpUtility.HtmlDecode("hello a & b, &lt;hello world &gt;") will return hello a & b, <hello world >.

  • HttpUtility.HtmlEncode("hello a & b, <hello world >") will return hello a &amp; b, &lt;hello world &gt;

Klaycon
  • 10,599
  • 18
  • 35
Karan
  • 12,059
  • 3
  • 24
  • 40
  • 14
    This is likely the most sane solution – TheGeneral Aug 24 '20 at 04:57
  • I tried using WebUtility instead of HttpUtility, but it is not decoding < and > Are not these classes same? – KP Joy Aug 24 '20 at 05:32
  • It will work same, you can check it here https://dotnetfiddle.net/zU1qyZ. Also refer https://stackoverflow.com/questions/17352981/webutility-htmldecode-vs-httputilty-htmldecode#:~:text=The%20difference%20is%20only%20intended,general%20purpose%20or%20client%20use. – Karan Aug 24 '20 at 05:41
6

You could use regex, I suppose:

Regex.Replace("hello a & b, &lt;hello world &gt;", "&(?![a-z]{1,};)", "&amp;");
  • & match literal &
  • (?! ) negative lookahead (assert that the following does not match)
  • [a-z]{1,}; any char a-z, one or more times, followed by a single ';'
afrischke
  • 3,836
  • 17
  • 30
0

You can try adding spaces on both sides of the character in the search string:

s.replace(" & ", " &amp;");
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Razack
  • 1,826
  • 2
  • 16
  • 37
0
string s = "hello a & b, &lt;hello world&gt;";
var sd =  s.Replace("&lt;", "<").Replace("&gt;", ">");
var e = HttpUtility.HtmlEncode(sd);
WriteLine(e);

output:

hello a &amp; b, &lt;hello world&gt;

ANSerpen
  • 323
  • 3
  • 11
0

I think @afrischke's answer is good enough. But it may be a little too restrictive. In case you only want to ignore &lt and &gt, you can use the following.

Regex.Replace("hello a & b, &lt;hello world &gt;", "&(?!(lt|gt);)", "&amp;");

&(?!(lt|gt);) : Literal "&" which is not followed by "lt;" or "gt;".

Hainan Zhao
  • 1,962
  • 19
  • 19