1

I want to convert the endianness of an UTF-16 character array stored as wchar_t*. Assuming sizeof(wchar_t) == 2 in this case.

Converting from BE to LE and LE to BE are both needed so ntoh/nton doesn't work.

I've read How do I convert between big-endian and little-endian values in C++? but I'm not sure how to apply it to a wchar_t.

Is there a way to swap the 2 bytes of a wchar_t? Or do I have to convert it to binary first?

EDIT: Though I didn't test all answers, I believe they all works. That said, I think Jarod42's answer is more straightforward.

Community
  • 1
  • 1
OckhamTheRazor
  • 107
  • 1
  • 1
  • 7
  • You know `wchar_t` is 4 bytes and the encoding is UTF-32? – Deduplicator Oct 24 '14 at 20:55
  • 1
    @Deduplicator: On Windows `wchar_t` is typically 2 bytes and UTF-16 – Billy ONeal Oct 24 '14 at 21:06
  • @BillyONeal, yes I was going to post that. There is the C++ standard and then what MS actually does. I had gathered Ockham may have left off visual-c++ as a tag. – Michael Petch Oct 24 '14 at 21:08
  • @Michael: The C and C++ standards make the size of `wchar_t` implementation defined; so there's nothing standards-related here. – Billy ONeal Oct 24 '14 at 21:08
  • @Deduplicator: There's nothing stopping a Unix implementation where `wchar_t` is 2 bytes. Considering the question says "assuming `wchar_t` is 2 bytes" I think the question is more than fine. – Billy ONeal Oct 24 '14 at 21:10
  • @BillyONeal In my comment I should have made it clear that I was referring to how MS implemented it per the standards. G++ has compiler flags to define the size of a wchar_t as well. – Michael Petch Oct 24 '14 at 21:14

4 Answers4

2

Following may help:

std::uint16_t swap_endian(std::uint16_t u)
{
    return (u >> 8) | ((u & 0xFF) << 8);
}
Jarod42
  • 203,559
  • 14
  • 181
  • 302
1

Reversing the bytes of any type, no matter how long:

template<class T> void reverse_bytes(T& x) {
    char* a = std::addressof(x);
    for(char* b = a + sizeof x - 1; a<b; ++a, --b)
        std::swap(*a, *b); 
}
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
0

I think this should work:

int main()
{
    wchar_t c = L'A';

    // char* can alias anything
    char* cptr = reinterpret_cast<char*>(&c);

    if(sizeof(wchar_t) == 2)
        std::swap(cptr[0], cptr[1]);
    else if(sizeof(wchar_t) == 4)
    {
        std::swap(cptr[0], cptr[3]);
        std::swap(cptr[1], cptr[2]);
    }
}
Galik
  • 47,303
  • 4
  • 80
  • 117
0

Converting from BE to LE and LE to BE are both needed so ntoh/nton doesn't work.

I still propose to use ntoh/hton functions family:
BE == network byte order
LE == host byte order

so:
For BE -> LE use: uint16_t ntohs(uint16_t netshort);
For LE -> BE use: uint16_t htons(uint16_t hostshort);

s.cpp
  • 91
  • 1
  • 3