1

In writing a function to convert between strings of different encodings (e.g. from UTF-8 to UTF-16), what would be the best way to handle errors (e.g. invalid input UTF-8 byte sequence)? Throwing an exception or returning an error code (even a bool)?

// Throws a C++ exception on error. 
std::wstring ConvertFromUtf8ToUtf16(const std::string& utf8);

// Returns true on success, false on error.
bool ConvertFromUtf8ToUtf16(std::wstring& utf16, const std::string& utf8);

Using exceptions, it would be possible to do chained function calls (when the function return value is used as input for other functions/methods).

But I'm not sure that using exceptions in this case is good; I was thinking of what Eric Lippert in his quality blog post calls vexing exceptions (and related Int32.Parse()/TryParse() example).

For example, if exceptions are used, the caller should be forced to wrap the function call in try/catch blocks to check the case of invalid UTF-8 input:

try
{
   wstring utf16 = ConvertFromUtf8ToUtf16(utf8);
}
catch(const Utf8ConversionException& e)
{
   // Bad UTF-8 byte sequence
   ...
}

Which seems not ideal to me.

Maybe the best thing to do is to just provide both overloads (implementing the conversion code in the non-throwing overload, and in the throwing overload just call the non-throwing version, and in case of error return code throw an exception)?

Mr.C64
  • 41,637
  • 14
  • 86
  • 162
  • What errors can occur on a conversion from utf-8 to utf-16? (hint: validating input should be completely separate from converting) – Pete Becker Sep 15 '12 at 15:32
  • 1
    If you convert to UTF-16, the result should be a `std::u16string`, not a `std::wstring`. The latter has a [very specific purpose](http://stackoverflow.com/questions/6300804/wchars-encodings-standards-and-portability). – Kerrek SB Sep 15 '12 at 15:37
  • 2
    Assuming that conversion from UTF-8 will almost always succeed in practice, it's not unreasonable to handle errors using exceptions. – Jon Sep 15 '12 at 15:38

3 Answers3

2

One guideline is to consider what will happen if users ignore or don't know that they should check your returned error code.

  • If the code could theoretically continue in the face of an error, returning an error could be considered reasonable. And as you mention, the code looks cleaner.
  • If ignoring the error would likely lead to Very Bad Behavior later, it's probably a better idea to throw the exception.
  • A third potential choice which somewhat balances the terseness of error codes and forcing the programmer to be aware of potential errors is to make the function require a reference to the error code. This will also work well in exported libraries and with (mostly older) compilers that don't handle exceptions efficiently.

    StringConversionResult result; // Could be a "success" bool

    wstring utf16 = ConvertFromUtf8ToUtf16(utf8, result);

Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
0

If this function is exported from a library, use return code. Throwing exception from exported function may crash the program in the case, when library and client are built with different C/C++ runtime libraries. Generally, this is undefined behavior.

For internal use, I believe, exception is a better choice. The case you are talking about, when caller doesn't use catch block, crashes the program immediately (unhandled exception). This is better, then continuing program execution with undefined results in some future point.

Alex F
  • 42,307
  • 41
  • 144
  • 212
  • Yes, I know C++ exceptions can't cross module boundaries safely (unless the same C++ compiler, same CRT and same compiler settings are used). But this is not the focus of the problem: in fact, along this line of thought, also using STL classes at the interface (like STL strings) is not supported at module boundaries (again, unless same C++ compiler/CRT/settings are used). – Mr.C64 Sep 15 '12 at 16:16
0

There are only three choices. The first is "Replace all failures by error codepoint"- the Unicode Standard provides for a couple of replacement codepoints. This is fine in some scenarios. The second is to throw an exception. The third is to provide an error function object, to be called on failure. For example,

bool fail = false;
std::u16string str = ConvertFromUTF8ToUTF16(utf8, [&] {
    return u16"default";
    // or
    throw std::runtime_error("fail");
    // or
    fail = true;
});

The point is that in no scenario do you depend on the user to check for failure- if he does nothing, then either his function does not continue, the compiler cries, or it's OK for the function to continue.

Returning an error code is not an option- this is plain hideously error prone.

Puppy
  • 144,682
  • 38
  • 256
  • 465