6

I'm trying to understand what is the best encode from C# that fulfill a requirement on a new SMS Provider.

The text I want to send is:

Bäste Björn

The encoded text that the provider say it needs is:

B%E4ste+Bj%F6rn

so ä is %E4 and ö is %F6


From this answer, I got that, for such conversion I need to use HttpUtility.HtmlAttributeEncode as the normal HttpUtility.UrlEncode will output:

B%c3%a4ste+Bj%c3%b6rn

and that outputs weird chars on the mobile phone :/

as several chars are not converted, I tried this:

private string specialEncoding(string text)
{
    StringBuilder r = new StringBuilder();
    foreach (char c in text.ToCharArray())
    {
        string e = System.Web.HttpUtility.UrlEncode(c.ToString());
        if (e.StartsWith("%") && e.ToLower() != "%0a") // %0a == Linefeed
        {
            string attr = System.Web.HttpUtility.HtmlAttributeEncode(c.ToString());
            r.Append(attr);
        }
        else
        {
            r.Append(e);
        }

    }
    return r.ToString();
}

verbose so I could breakpoint and test each char, and found out that:

System.Web.HttpUtility.HtmlAttributeEncode("ä") is actually equal to ä... so there is no %E4 as output...

What am I missing? and is there a simply way to do the encoding without manipulating them char by char and have the required output?

Community
  • 1
  • 1
balexandre
  • 73,608
  • 45
  • 233
  • 342

1 Answers1

7

that the provider say it needs

Ask the provider in which age they are living. According to Wikipedia: Percent-encoding:

The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. This requirement was introduced in January 2005 with the publication of RFC 3986. URI schemes introduced before this date are not affected.

Granted, this RFC talks about "new URI schemes", which HTTP obviously is not, but adhering to this standard prevents headaches like this. See also What is the proper way to URL encode Unicode characters?.

They seem to want you to encode characters according to the Windows-1250 Code Page (or comparable, like ISO-8859-1 or -2, check alternatives here) instead, as using that code page E4 (132) maps to ä and F6 (148) maps to ö. As @Simon points out in his comment, you should ask the provider which code page exactly they want you to use.

Assuming Windows-1250, you can implement it like this, according to URL encode ASCII/UTF16 characters:

var windows1250 = Encoding.GetEncoding(1250);
var percentEncoded = HttpUtility.UrlEncode("Bäste Björn", windows1250);

The value of percentEncoded is:

B%e4ste+Bj%f6rn

If they insist on using uppercase, see .net UrlEncode - lowercase problem.

Community
  • 1
  • 1
CodeCaster
  • 147,647
  • 23
  • 218
  • 272
  • they pointed out that the encode "table" I should use is the one in http://www.w3schools.com/tags/ref_urlencode.asp ... [yeahh, w3schools... I know](http://www.w3fools.com/) – balexandre Mar 26 '14 at 11:03
  • +1 yes, cp437 seems a little odd. FWIW ASCII strictly doesn't have code pages. ASCII is a 7 bit encoding. cp437 is a DOS/Windows specific thing. – David Heffernan Mar 26 '14 at 11:05
  • @balexandre Except that page lists “ä” under as `%C4`, seemingly following [ISO 8859-1](https://en.wikipedia.org/wiki/ISO/IEC_8859-1). – svick Mar 26 '14 at 11:11
  • 1
    @balexandre their JavaScript encoder on that page encodes `ä` to `%C3%A4`... there's a reason w3fools exists. – CodeCaster Mar 26 '14 at 11:11
  • @CodeCaster I know about w3fools, I use [Mozilla Developer Network](http://developer.mozilla.org) every time I need something... – balexandre Mar 26 '14 at 11:13
  • 2
    1250 is "Windows-1250". In fact, it could be "ISO-8859-2" (28592). It really depends on this "so-called" SMS provider. Only he can tell what he *really* needs... – Simon Mourier Mar 26 '14 at 11:14
  • 1
    just tested and the text get's correctly now using `1250` code. **Thank you** @SimonMourier and @CodeCaster- **BTW** SMS Provider is http://www.Infobip.com – balexandre Mar 26 '14 at 11:36