3

We've noticed that UTF8 characters don't come out correctly when using UIDevice.CurrentDevice.Name in MonoTouch.

It comes out as "iPad 2 ??", if you use some of the special characters like holding down the apostrophe key on the iPad keyboard. (Sorry don't know the equivalent to show these characters in windows)

Is there a recommended workaround to get the correct text? We don't mind to convert to UTF8 ourselves. I also tried simulating this from a UITextField and it worked fine--no UTF8 problems.

The reason this is causing problems is we are sending this text off to a web service, and it's causing XML parsing issues.

Here is a snipped of the XmlWriter code (_parser.WriteRequest):

            using (XmlWriter xmlWriter = XmlWriter.Create(textWriter, new XmlWriterSettings 
                { 
#if DEBUG
                    Indent = true,
#else
                    Indent = false, NewLineHandling = NewLineHandling.None, 
#endif
                    OmitXmlDeclaration = true 
                }))
            {
                xmlWriter.WriteStartDocument();
                xmlWriter.WriteStartElement("REQUEST");
                xmlWriter.WriteAttributeString("TYPE", "EXAMPLE");
                xmlWriter.WriteEndElement();
                xmlWriter.WriteEndDocument();
            }

The TextWriter is passed in from:

public Response MakeRequest(Request request)
{
    var httpRequest = CreateRequest(request);

    WriteRequest(httpRequest.GetRequestStream(), request);

    using (var httpResponse = httpRequest.GetResponse() as HttpWebResponse)
    {
        using (var responseStream = httpResponse.GetResponseStream())
        {
            var response = new Response();
            ReadResponse(response, responseStream);
            return response;
        }
    }
}

private void WriteRequest(Stream requestStream, Request request)
{
    if (request.Type == null)
    {
        throw new InvalidOperationException("Request Type was null!");
    }

    if (_logger.Enabled)
    {
        var builder = new StringBuilder();
        using (var writer = new StringWriter(builder, CultureInfo.InvariantCulture))
        {
            _parser.WriteRequest(writer, request);
        }
        _logger.Log("REQUEST: " + builder.ToString());

        using (requestStream)
        {
            using (StreamWriter writer = new StreamWriter(requestStream))
            {
                writer.Write(builder.ToString());
            }
        }
    }
    else
    {
        using (requestStream)
        {
            using (StreamWriter writer = new StreamWriter(requestStream))
            {
                _parser.WriteRequest(writer, request);
            }
        }
    }
}

_logger writes to Console.WriteLine, it is enabled in #if DEBUG mode. Request is just a storage class with properties, sorry easy to confuse with HttpWebRequest.

I'm seeing ?? in both XCode's console and MonoDevelop's console. I'm also assuming the server is receiving them strangely as well, as I get an error. Using UITextField.Text with the same strange characters instead of the device description works fine with no issues. It makes me think the device description is the culprit.

EDIT: this fixed it -

Encoding.UTF8.GetString (Encoding.ASCII.GetBytes(UIDevice.CurrentDevice.Name));

jonathanpeppers
  • 26,115
  • 21
  • 99
  • 182
  • It's not clear what you mean by "UTF8 characters" - UTF-8 is a way of *encoding* characters into binary. It's not a character *set*. How are you creating the XML? – Jon Skeet Mar 27 '12 at 16:46
  • Using XmlWriter in MonoTouch, server is unknown (3rd party). Let me be more clear, `UITextField.Text` returns the characters as expected we get ? characters from `UIDevice.CurrentDevice.Name`, I'm assuming this means "unknown character". Jon, didn't know you worked with MonoTouch. – jonathanpeppers Mar 27 '12 at 16:48
  • I haven't done any MonoTouch development, but I've seen plenty of encoding errors before. How *exactly* are you using `XmlWriter`. Please show your code. When you say "we get ? characters" - do you mean that's what the web service receives, or that's what you see in the debugger, or something else? Have you tried logging the Unicode code points of each character in the string? Usually tracking down this sort of problem is just a matter of finding out *where* you're losing data. – Jon Skeet Mar 27 '12 at 16:51
  • I wonder if the **fix** is that `UTF8` is now *explicitly* linked-in. E.g. what happens if you ask for `Name` again later (without the encoding calls) ? or what does the MD debugger shows ? – poupou Mar 27 '12 at 19:10
  • I gave this a try, no luck. I'm not sure if it's a linking issue. – jonathanpeppers Mar 27 '12 at 19:51

2 Answers2

4

Okay, I think I know the problem. You're creating a StringWriter, which always reports its encoding as UTF-16 (unless you override the Encoding property). You're then taking the string from that StringWriter (which will start with <?xml version="1.0" encoding="UTF-16" ?>) and writing it to a StreamWriter which will default to UTF-8. That mixture of encodings is causing the problem.

The simplest approach would be to change your code to pass a Stream directly to the XmlWriter - a MemoryStream if you really want, or just requestStream. That way the XmlWriter can declare that it's using the exact encoding that it's actually writing the binary data in - you haven't got an intermediate step to mess things up.

Alternatively, you could create a subclass of StringWriter which allows you to specify the encoding. See this answer for some sample code.

Community
  • 1
  • 1
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • The intermediate step is so we have a `StringBuilder` to pass to our logger, can I set the encoding to UTF8 on all my writers? – jonathanpeppers Mar 27 '12 at 17:05
  • 1
    @Jonathan.Peppers: See the last paragraph of my answer. – Jon Skeet Mar 27 '12 at 17:10
  • Tried your `Utf8StringWriter`, didn't solve it, but left it for completeness. See my edit above that worked. Is this a valid solution? Does this seem like something I should submit to Xamarin? – jonathanpeppers Mar 27 '12 at 17:30
  • @Jonathan.Peppers: Absolutely not - that just limits it to ASCII. You should really log the characters in the XML, and also look closely at *exactly* what goes into the web service (using Wireshark). – Jon Skeet Mar 27 '12 at 17:38
  • Yeah, what I was afraid of. But if the text is coming out ASCII from Xamarin's end, not much else I can do to fix it, right? I think input from @poupou will help. – jonathanpeppers Mar 27 '12 at 17:47
  • @Jonathan.Peppers: I don't see any evidence that the text is coming out in ASCII from MonoTouch. Have you performed the logging I've mentioned yet? Log out the individual `char` values within the string, casting to `int` to make sure you get the *numeric* value. – Jon Skeet Mar 27 '12 at 17:55
  • I ended up using my code above and removing ? characters. It's awful, but I found the server seems to return the characters non-UTF8 encoded. Our client doesn't need to support crazy characters for device descriptions. @Jon Skeet thanks alot for the help, not many people can help diagnose UTF8 issues in iOS apps remotely from Windows! (just guessing) – jonathanpeppers Mar 27 '12 at 21:51
  • @Jonathan.Peppers: I'd still love to know what was at the bottom of it in the end, but I'm glad you've found something you're happy enough with... – Jon Skeet Mar 27 '12 at 21:52
1

MonoTouch simply calls NSString.FromHandle on the value it receive from the call on UIDevice.CurrentDevice.Name. That just like most string are created from NSString inside all bindings.

That should get you a string that you can see it MonoDevelop (no ?) so I can't rule out a bug.

Can you tell us exactly how the device is named ? if so then please open a bug report and we'll check this possibility.

poupou
  • 43,413
  • 6
  • 77
  • 174
  • Hold down the apostrophe key on the iPad keyboard. It will give you 4 options, the two curved apostrophes are the ones causing issues. You can just print out `UIDevice.CurrentDevice.Name` in the console to see what I'm seeing. – jonathanpeppers Mar 27 '12 at 17:45
  • I renamed my iPad to **Neptune'‘a’** and it works fine (I see it fine on the console, inside MD debugger...). It's a long shot but can you try without linking (i.e. **Don't link**) your application ? just in case some code gets stripped from your application (but not from my unit tests). – poupou Mar 27 '12 at 18:00
  • Right, I'm going to try @jon's idea of printing integer values also. – jonathanpeppers Mar 27 '12 at 18:04
  • I'm having trouble reproducing this in a new project, so I'm guessing it is my fault and Jon Skeet is correct in thinking it just a standard .Net encoding issue. Thanks for the help, I'll post back when I figure it out. – jonathanpeppers Mar 27 '12 at 19:35
  • 1
    I wanted to add, some of my confusion lied with ?? coming out in the console, but the value was actually correct in C#. It is also correct in the debugger in MonoDevelop. It seems there is only a loss of UTF8 when coming out on the console on the device. It also happens in XCode, so it's probably not a Xamarin issue--just confusing. – jonathanpeppers Mar 27 '12 at 21:56