0

I have a USB string descriptor in a uint8_t array. For example:

0000:12 03 34 00 45 00 36 00 31 00 42 00 43 00 30 00 ..4.E.6.1.B.C.0.
0010:30 00                                           0.

(The first two bytes are the length and descriptor type; the remaining bytes are the uint16_t characters.)

I would like to print this on the terminal with as little hassle as possible, and preferably without having to screw around with all the other printing (which happens like cout << "Hello, world" << endl;)

In particular, I would like to do:

cout << "Serial number is: " << some_cast_or_constructor( buf + 2, len - 2 ) << endl;

and for the string descriptor above, get the following on a terminal:

Serial number is: 4E61BC00

Is this possible, or do I have to delve into Unicode arcana?

[edit to add:]

Per @PaulMcKenzie, I tried this program:

#include <iostream>
#include <fstream>
#include <exception>
#include <string>
#include <locale>

int
main( int argc, char **argv )
{
    char    buf[] = { 34, 00, 45, 00, 36, 00, 31, 00, 42, 00, 43, 00, 30, 00, 30, 00 };

    std::wcout << "Hello" << std::wstring( (const wchar_t *)buf, sizeof(buf) ) << std::endl;

    return 0;
}

The output:

user:/tmp$ g++ foo.cc
user:/tmp$ ./a.out 
Hello??????????
user:/tmp$ 
Dave M.
  • 1,496
  • 1
  • 12
  • 30
  • Use `std::wcout`, not `std::cout`. – PaulMcKenzie Oct 17 '16 at 03:23
  • Do you know what are the `uint16_t` bytes? e.g UTF-16? – Mine Oct 17 '16 at 03:27
  • I don't know for sure...it's USB code that I wrote, but the descriptors are defined as assembly-language `.string16 "abcd"`. The hexdump is exactly what I have in the memory buffer. I tried std::wcout (per @PaulMcKenzie) but I get a bunch of ? marks. – Dave M. Oct 17 '16 at 03:31
  • [Works for Visual Studio 2015](http://rextester.com/QKF99172) – PaulMcKenzie Oct 17 '16 at 03:54
  • No luck on Linux (Debian), gcc-4.9.2. On MacOSX, I get `Hello[nothing]`. I guess it's time for some digging. (It could easily be a terminal problem, I guess.) – Dave M. Oct 17 '16 at 04:21

2 Answers2

1

In your source code, I detect two errors: 1- in your USB rawdata (on the top), values are hexadecimal and in your buf[] values are decimal. It should be written:

    char    buf[] = { 0x34, 0x00, 0x45, 0x00, 0x36, 0x00, 0x31, 0x00, 0x42,
                      0x00, 0x43, 0x00, 0x30, 0x00, 0x30, 0x00 };

2- in your print message, the lenght is equal to sizeof(buf) but it is 'char' (1 byte) and not 'wchar_t' (2bytes). It should be written:

std::wcout << "Hello" << std::wstring( (const wchar_t *)buf, (sizeof(buf) >> 1) ) << std::endl;

And, this code gives the expected result on a Windows PC... be sure there is not a big/little endian conversion before managing 'wchar_t' on your computer.

Could you check the sizeof(wchar_t) under Linux ? This post 'Difference and conversions between wchar_t for Linux and for Windows' supposes that wchar_t is a 32bits value.

Community
  • 1
  • 1
J. Piquard
  • 1,665
  • 2
  • 12
  • 17
  • Oops...decimal instead of hex is a dumb mistake! Cut & paste from a hexdump doesn't work exactly right. However, even with your corrections, it doesn't work on g++/Linux (I also tried swapping endianness by moving the 0 byte from the end of the array to the beginning). I guess I'll have to learn more about multibyte characters and I/O. – Dave M. Oct 17 '16 at 15:12
  • Ha! I just started poking around with this, and the very first thing I did was print out sizeof(wchar_t). It's 4, so that's my first problem. USB uses UNICODE (per USB-2.0 sec. 9.6.7) but all I really know about it is every example I've seen uses .string16. I guess it's time to learn how UNICODE _really_ works! – Dave M. Oct 18 '16 at 01:50
  • (GCC/libstdc++ docs about character set conversions)[https://gcc.gnu.org/onlinedocs/libstdc++/manual/facets.html#std.localization.facet.codecvt] – Dave M. Oct 18 '16 at 02:07
  • The simple implementation detail of wchar_t's size seems to repeatedly confound people. Many systems use a two byte, unsigned integral type to represent wide characters, and use an internal encoding of Unicode or UCS2. (See AIX, Microsoft NT, Java, others.) Other systems, use a four byte, unsigned integral type to represent wide characters, and use an internal encoding of UCS4. (GNU/Linux systems using glibc, in particular.) The C programming language (and thus C++) does not specify a specific size for the type wchar_t. Thus, portable C++ code cannot assume a byte size (or endianness) either. – Dave M. Oct 18 '16 at 02:10
0

If you've reached this question because you're having trouble with Unicode, wide characters and similar on Linux, the quickest way I found to move forward is to use libiconv. The <codecvt> header file that you'll read about in C++ docs is not yet implemented in GNU libstdc++ (as of October 2016).

Here is a quick sample program that demonstrates libiconv:

#include <iostream>
#include <locale>
#include <cstdint>
#include <iconv.h>
#include <string.h>

int
main( int, char ** )
{
    const char       a[] = "ABC";
    const wchar_t    b[] = L"ABC";
    const char       c[] = u8"ABC";
    const char16_t   d[] = u"ABCDEF";
    const char32_t   e[] = U"ABC";
    iconv_t          utf16_to_utf32 = iconv_open( "UTF-32", "UTF-16" );
    wchar_t          wcbuf[32];
    char            *inp = (char *)d;
    size_t           inl = sizeof(d);
    char            *outp = (char *)wcbuf;
    size_t           outl = sizeof(wcbuf);

    iconv( utf16_to_utf32, &inp, &inl, &outp, &outl );

    std::wcout << "sizeof(a) = " << sizeof(a) << ' ' << a << std::endl
               << "sizeof(b) = " << sizeof(b) << ' ' << b << std::endl
               << "sizeof(c) = " << sizeof(c) << ' ' << c << std::endl
               << "sizeof(d) = " << sizeof(d) << ' ' << d << std::endl
               << "sizeof(e) = " << sizeof(e) << ' ' << e << std::endl
               << "Converted char16_t to UTF-32: " << std::wstring( wcbuf, (wchar_t *)outp - wcbuf ) << std::endl;

    iconv_close( utf16_to_utf32 );

    return 0;
}

Resulting output:

user@debian:~/code/unicode$ ./wchar 
sizeof(a) = 4 ABC
sizeof(b) = 16 ABC
sizeof(c) = 4 ABC
sizeof(d) = 14 0x7ffefdae5a40
sizeof(e) = 16 0x7ffefdae5a30
Converted char16_t to UTF-32: ABCDEF
user@debian:~/code/unicode$ 

Note that std::wcout doesn't print char16_t or char32_t properly. However, you can use iconv to convert UTF-16 (which is apparently what you get from u"STRING") to UTF-32 (which is apparently compatible with wchar_t on a late-model Linux system).

Dave M.
  • 1,496
  • 1
  • 12
  • 30