0

I am working on an asp.net MVC web application. and i am building a URI to be sent to a web api. but the UriBuilder is adding these characters %u200b to the beginning of a parameter. here is my method:-

public string Add(string title, string account,string site,string description)
{

  XmlDocument doc = new XmlDocument();
  using (var client = new WebClient())
  {
    var query = HttpUtility.ParseQueryString(string.Empty);
    query["account"] = account;
    query["site"] = site;                     
    query["title"] = title;
    query["description"] = description;
    string apiurl = System.Web.Configuration.WebConfigurationManager.AppSettings["ApiURL"];
    var url = new UriBuilder(apiurl);
    url.Query = query.ToString();
    string xml = client.DownloadString(url.ToString());
    doc.LoadXml(xml);

now the site parameter will be passed to the method as Manchester (MAN) but the final query will have the parameter with %u200b added to it as follow:-

https://****?account=ABC&site=%u200bManchester+(MAN)&title=ABCDE

so can anyone advice on this please? why the UriBuilder is adding %u200b to the parameter ?? now the value i am passing is actually a drop-down option, and it is rendered correctly as follow + if i chose another option for the site name i will not face the problem:-

enter image description here

John John
  • 1
  • 72
  • 238
  • 501
  • 2
    `%u200b` is a 'zero width space' –  Aug 01 '18 at 23:39
  • @StephenMuecke thanks for the reply.. so what does this mean? and why this is being added to the URL parameter,although the drop-down option does not have it...did u get my point – John John Aug 01 '18 at 23:45
  • 1
    I think you will find that it is actually in the html if you inspect he source and/or escape it (zero width spaces are not visible in the page). I have seen this issue before where users have cut and pasted html elements (although not sure if that is applicable in your case) –  Aug 01 '18 at 23:49
  • Possible duplicate of [Does Notepad++ show all hidden characters?](https://stackoverflow.com/questions/767545/does-notepad-show-all-hidden-characters) – mjwills Aug 01 '18 at 23:49
  • @mjwills the result for `site.Length` will be 17 – John John Aug 01 '18 at 23:52
  • 1
    @johnG, if `site.Length=17`, then that confirms that the extra hidden character is their (otherwise it would be 16) –  Aug 01 '18 at 23:55
  • You can probably remove it using `.Replace("\u200B", "");` (although that does not address the real issue, which is why its added in the first place) –  Aug 02 '18 at 00:01
  • @StephenMuecke could the problem be that the value for the site name `Manchester+(MAN)` is added wrongly inside the database? as i me when i tried another option for the site name, i did NOT get this problem.. so seems the problem is specifically on this site name – John John Aug 02 '18 at 00:02
  • @johnG. That seems the most logical reason –  Aug 02 '18 at 05:07
  • @StephenMuecke so now for me to identify the values which have this charecter, can i query the string which contain this character inside SQL server DB? – John John Aug 02 '18 at 13:33
  • https://dba.stackexchange.com/questions/138350/how-to-check-for-non-ascii-characters may be of assistance. – mjwills Aug 02 '18 at 13:50

1 Answers1

1

The issue is that you have a zero width space in your string (which, as the name suggests, is 'invisible').

Unfortunately string.Trim does not remove those characters, as per the docs:

Notes to Callers: The .NET Framework 3.5 SP1 and earlier versions maintain an internal list of white-space characters that this method trims. Starting with the .NET Framework 4, the method trims all Unicode white-space characters (that is, characters that produce a true return value when they are passed to the Char.IsWhiteSpace method). Because of this change, the Trim method in the .NET Framework 3.5 SP1 and earlier versions removes two characters, ZERO WIDTH SPACE (U+200B) and ZERO WIDTH NO-BREAK SPACE (U+FEFF), that the Trim method in the .NET Framework 4and later versions does not remove. In addition, the Trim method in the .NET Framework 3.5 SP1 and earlier versions does not trim three Unicode white-space characters: MONGOLIAN VOWEL SEPARATOR (U+180E), NARROW NO-BREAK SPACE (U+202F), and MEDIUM MATHEMATICAL SPACE (U+205F).

Thus you need to either move to .NET 3.5 SP1 or earlier (not recommended) or use string.Replace("\u200B", "") as @StephenMuecke suggested.

Or, even better, fix your source database to remove the errant character there.

I'd also recommend installing Notepad++ to more easily see these hidden characters in future.

mjwills
  • 23,389
  • 6
  • 40
  • 63
  • now my application is on .net 4.5 so not sure why moving to 3.5 SP1 will fix this.. of course i am not planning to do this move to just fix this problem... – John John Aug 02 '18 at 00:21
  • 1
    `so not sure why moving to 3.5 SP1 will fix this..` Read the text in yellow above. It explains that the `Trim` method **used** to remove those characters. – mjwills Aug 02 '18 at 00:22