How to replace a character in C# string ignoring other characters?

Question

Consider that I have a following string:

string s = "hello a & b, &lt;hello world &gt;"

I want to replace "&" (b/w a and b) with "&"

So, if I use

s.replace("&", "&amp;");

It will also replace "&" associated with < and >.

Is there any way I can replace only "&" between a and b?

As with all cases of string encodings, you ought to know in what format the original string is to implement a correct function. For instance, is the original string intended to be a "raw string" or a "string with HTML entities"? The [answer by Karan](https://stackoverflow.com/a/63554951/603003) is only correct for the latter. — ComFreek, Aug 24 '20 at 15:08
Karan's answer is a good one, but you should try to avoid this situation in the first place. How did you come to have a string that's a mix of html entities and unencoded & symbols? Could you have encoded the & before combining with the <? — bmm6o, Aug 24 '20 at 16:11
@ComFreek: As with many of the encoding questions here on SO, I think the answer to your question is: "The original string is an arbitrary mix on HTML-encoded and non-HTML-encoded strings created by someone who doesn't know what they are doing, and I have been tasked to 'fix it' in a later stage. I know that this is impossible in the general case, so please provide a hack that works for most cases and I'll pray that the edge case which finally subtly breaks it only comes around after I have left the company". — Heinzi, Aug 25 '20 at 10:46

score 56 · Accepted Answer · edited Aug 24 '20 at 16:47

56

You can rather use HttpUtility.HtmlEncode & HttpUtility.HtmlDecode like below.

First decode your string to get normal string and then encode it again which will give you expected string.

HttpUtility.HtmlEncode(HttpUtility.HtmlDecode("hello a & b, &lt;hello world &gt;"));

HttpUtility.HtmlDecode("hello a & b, <hello world >") will return hello a & b, <hello world >.
HttpUtility.HtmlEncode("hello a & b, <hello world >") will return hello a & b, <hello world >

edited Aug 24 '20 at 16:47

Klaycon

10,599
18
35

answered Aug 24 '20 at 04:57

Karan

12,059
3
24
40

14

This is likely the most sane solution – TheGeneral Aug 24 '20 at 04:57
I tried using WebUtility instead of HttpUtility, but it is not decoding < and > Are not these classes same? – KP Joy Aug 24 '20 at 05:32
It will work same, you can check it here https://dotnetfiddle.net/zU1qyZ. Also refer https://stackoverflow.com/questions/17352981/webutility-htmldecode-vs-httputilty-htmldecode#:~:text=The%20difference%20is%20only%20intended,general%20purpose%20or%20client%20use. – Karan Aug 24 '20 at 05:41

score 6 · Answer 2 · answered Aug 25 '20 at 02:31

You could use regex, I suppose:

Regex.Replace("hello a & b, &lt;hello world &gt;", "&(?![a-z]{1,};)", "&amp;");

& match literal &
(?! ) negative lookahead (assert that the following does not match)
[a-z]{1,}; any char a-z, one or more times, followed by a single ';'

score 0 · Answer 3 · edited Aug 24 '20 at 19:57

0

You can try adding spaces on both sides of the character in the search string:

s.replace(" & ", " &amp;");

edited Aug 24 '20 at 19:57

Cody Gray - on strike

239,200
50
490
574

answered Aug 24 '20 at 05:15

Razack

1,826
2
16
37

5

what about `&` at the start or end of string? – Iłya Bursov Aug 24 '20 at 18:44
1

What you have suggested can used for this specific string. In reality there might be very different situations as Ilya mentioned. – Naser.Sadeghi Aug 27 '20 at 21:50

score 0 · Answer 4 · answered Aug 25 '20 at 20:07

0

string s = "hello a & b, &lt;hello world&gt;";
var sd =  s.Replace("&lt;", "<").Replace("&gt;", ">");
var e = HttpUtility.HtmlEncode(sd);
WriteLine(e);

output:

hello a & b, <hello world>

answered Aug 25 '20 at 20:07

ANSerpen

323
3
11

what about `test " test`? – Iłya Bursov Aug 31 '20 at 15:53
@IłyaBursov, I've voted for https://stackoverflow.com/a/63554951/7052021 :) – ANSerpen Sep 01 '20 at 16:43

score 0 · Answer 5 · answered Aug 31 '20 at 02:50

I think @afrischke's answer is good enough. But it may be a little too restrictive. In case you only want to ignore &lt and &gt, you can use the following.

Regex.Replace("hello a & b, &lt;hello world &gt;", "&(?!(lt|gt);)", "&amp;");

&(?!(lt|gt);) : Literal "&" which is not followed by "lt;" or "gt;".

How to replace a character in C# string ignoring other characters?

5 Answers5