1

We've an old C++ app that's making calls to third-party webservices, using WinHttp.WinHttpRequest.5.1.

I won't list all of the details of the call sequence, as I don't think it's relevant to the problem, but we finish by calling hr = pIWinHttpRequest->get_ResponseText(&bstrResponse);, where bstrResponse is of type BSTR.

The calling code doesn't work with BSTRs, it works with standard C/C++ char *'s, so the code converts the BSTR to a char * with:

_bstr_t b(bstrResponse);
const char *c = static_cast<char *>(b);

And for all of the prior webservices we've accessed with this code, this has worked. But for this new one, it's not.

The data we're getting back is supposed to be XML, but for this one webservice, it looks like we're getting some character code conversion problems. Our resulting string starts with; "?&lt;?xml version="1.0" encoding="utf-8"?&gt;..."

Notice the extra ? at the beginning. When walking through this in the debugger, we don't see this in displayed value of bstrResponse, and we don't see it in the displayed value of b, but we do see it in the displayed value of c.

Any ideas as to what might be going on?

EDITED

I understand that BSTR is a multi-byte type, but all of the characters in this string are plain ASCII, and none of the code that calls this function can handle multi-byte characters. Browsing around the web, I see this specific mechanism recommended frequently, but in this case, it doesn't work.

I need to convert this string from BSTR to an array of single-byte characters. Even if that means stripping out multi-byte characters that cannot be converted.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Jeff Dege
  • 11,190
  • 22
  • 96
  • 165
  • It won't work because a BSTR isn't a string of byte-length characters. I refer you to [this question](http://stackoverflow.com/questions/6284524/bstr-to-stdstring-stdwstring-and-vice-versa) for details. – Jack Aidley Jan 21 '13 at 16:42
  • have you maybe considered that `static_cast` might not be a valid way to convert arbitrary types into printable `char*` strings? ;) – jalf Jan 21 '13 at 16:42
  • I understand that BSTR is a multi-byte type, but all of the characters in this string are plain ASCII, and none of the code that calls this function can handle multi-byte characters. – Jeff Dege Jan 21 '13 at 16:59
  • 1
    @Jeff They are not necessarily plain ASCII, according to the XML tag they are UTF-8. I’m assuming that the question mark you’re seeing is the [byte order mark](https://en.wikipedia.org/wiki/Byte_order_mark). – Konrad Rudolph Jan 21 '13 at 17:00
  • I'd have thought so, except that all of the other webservices we're hitting are also coming back UTF-8, and aren't showing this problem. – Jeff Dege Jan 21 '13 at 17:02
  • @Jeff The byte order mark on UTF-8 is optional (and in fact not recommended) so it’s not surprising that most sites don’t serve it. – Konrad Rudolph Jan 21 '13 at 17:03
  • @JeffDege: I edited your question to add formatting. Can you check that the XML part: `"?<?xml` is displaying correctly? Or should it appear as `? – nhahtdh Jan 21 '13 at 17:03
  • It should appear with the leading ? – Jeff Dege Jan 21 '13 at 17:24
  • For what it's worth, I've tried stripping out the leading '?', and the parsing is still failing. – Jeff Dege Jan 21 '13 at 17:25
  • Have you tried `ConvertBSTRToString`? http://msdn.microsoft.com/en-us/library/ewezf1f6%28VS.80%29.aspx –  Jan 21 '13 at 17:28
  • 1
    @0A0D The `static_cast` results in a call to `ConvertBSTRToString`. – David Heffernan Jan 21 '13 at 17:47
  • What's with the close votes? This is not a dupe of that question!! – David Heffernan Jan 21 '13 at 22:20
  • @HeathHunnicutt No I cannot. How else do you expect `_bstr_t` to do it. Perhaps I should have said that the result of the cast would be indistinguishable from `ConvertBSTRToString`. – David Heffernan Jan 22 '13 at 18:48

2 Answers2

2

The conversion in your code using static_cast on a _bstr_t converts to ANSI correctly. The appearance of ? in an encoding conversion indicates that the conversion of a character failed. The most likely reason for this is that bstrResponse contains characters that are not present in your ANSI codepage. I would expect that you should be converting to UTF-8 rather than ANSI, but of course I don't have all the information that you have.

The bottom line is that the ? indicates that the source string contains a character that cannot be encoded in the destination character set.

Update

Your answer gives further evidence that you should be converting to UTF-8. Only you can know for sure, but the evidence you present is consistent with that conclusion.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • Which would be true, if the string included a BOM. – Jeff Dege Jan 21 '13 at 17:55
  • I don't understand what you mean. Looking at the answer you posted, it would seem that my analysis was accurate. – David Heffernan Jan 21 '13 at 17:58
  • I'm not disagreeing with you. Because this string has a BOM, it's first character after conversion is '?'. – Jeff Dege Jan 21 '13 at 22:14
  • I needed to convert to 7-bit ASCII, even if that means stripping out or converting characters that cannot be represented in 7-bit ASCII. (I also need to convince my bosses that we should find a better XML parser than the one we're using, in this app.) – Jeff Dege Jan 22 '13 at 21:20
  • What makes you think that? All the signs are that UTF-8 is what you need. That's the native encoding for XML. I'd expect your parser to work on UTF-8. – David Heffernan Jan 22 '13 at 21:23
0

Turns out there were two problems. First that the conversion process described above does not strip out the byte-order-mark, which in my mind it should, and the second that the old C++ XML parser we are using chokes on 8-bit ASCII chars, and this webservice is sending us a copyright symbol in their text, ASCII '\xA9'.

With the BOM stripped and high-bit characters replaced by spaces, the parser works fine.

Jeff Dege
  • 11,190
  • 22
  • 96
  • 165