0

I have a SQL database that does not allow nvarchar i.e. unicode characters. Is there a way for me to set the encoding on a rest service (built with web api) so that it fails if the request contains non ascii unicode characters?

newbie_86
  • 4,520
  • 17
  • 58
  • 89
  • "Unicode characters" encompasses **all possible characters**. You'll have to be a bit more specific for starters. Also, what platform/language/framework are you using? – deceze Jan 22 '16 at 11:39
  • I think you are saying you want to restrict the text data you send to your database to characters supported by the database's character set. You need to find out what that is. To do it in SQL see this [answer](http://stackoverflow.com/a/7321208/2226988). – Tom Blodget Jan 22 '16 at 17:57
  • Do you have your answer @newbie_86? If not, can you give us an update to tell us what you expect? – SandRock Jul 08 '16 at 09:49

1 Answers1

0

You can use the built-in encoders of .NET to achieve charset validation.

The Encoding class gives you access to all the encodings.

Each Encoding instance allows to convert from string to byte[] and vice versa with the methods GetBytes(string) and GetString(byte[]).

The GetBytes method replaces non-convertible characters with '?' question marks. You can change this behavior to get an exception instead.

The following code checks for non ISO-8859-1 characters and throws an exception on invalid chars.

[TestMethod]
public void VerifyEncoding_UTF8CharsGiven()
{
    var value = "hello ☃❆⛇✺⑳⑯✵ȳה";
    var encoding = Encoding.GetEncoding("iso-8859-1", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback);
    try
    {
        var bytes = encoding.GetBytes(value);
        Assert.Fail("EncoderFallbackException should have occured");
    }
    catch (EncoderFallbackException ex)
    {
        // Unable to translate Unicode character \u1F384 at index 6 to specified code page.
        Debug.WriteLine(ex.Message);
    }
}

This one shows successful encoding match because no exception occurs.

[TestMethod]
public void VerifyEncoding_CorrectCharsGiven()
{
    var value = "hello my friends";
    var encoding = Encoding.GetEncoding("iso-8859-1", EncoderFallback.ExceptionFallback, DecoderFallback.ExceptionFallback);
    var bytes = encoding.GetBytes(value); // no exception
}

You can make a validation attribute with this code or make validation the way you desire. You may want to create your own DecoderFallback that does not throw an exception and calls a callback for performance reasons.

SandRock
  • 5,276
  • 3
  • 30
  • 49