0

I need to convert some utf8 encoded numbers into floats in c++ using VS2013. Is there anything in the standard library or provided by microsoft headers that would help me do that?

Alternatively, I hear that utf8 should be compatible with ASCII, is there anything for that?

user81993
  • 6,167
  • 6
  • 32
  • 64

2 Answers2

3

Don't panic. For all digits and for all other characters used in floating numbers, UTF8 is the same as ASCII.

UTF8 represents unicode characters by sequences of bytes. These sequences have variable length. For all unicode characters below 128, the sequence is just one byte containing that character. Thus for you there is no difference between UTF8 and ASCII.

You can use the standard methods and ignore that the input is UTF8.

Community
  • 1
  • 1
Hans Klünder
  • 2,176
  • 12
  • 8
  • 2
    These are all digits: 0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮‌​૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩‌​၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩0123456789 :) – Hans Passant Dec 28 '14 at 11:13
  • That's a lot - thanks! I never knew that there are so many of them. I have to confess that my answer is valid for the simplest of all digits only, that is, the first 10 digits of your list. – Hans Klünder Dec 28 '14 at 11:25
  • @HansPassant: do you know whether the C++ runtime has any functions that process non-ASCII digits? – Harry Johnston Dec 28 '14 at 11:33
  • Depends on the specific CRT implementation you use. But generally there's little reason to be optimistic. – Hans Passant Dec 28 '14 at 11:47
  • @HansPassant: All readers may not get the joke. So just to be clear about that, the Unicode notion of digit is not the C++ notion of digit. I.e. there's not a problem. – Cheers and hth. - Alf Dec 28 '14 at 14:00
  • **+1** for being practical about things. – Cheers and hth. - Alf Dec 28 '14 at 14:00
2

You can use MultiByteToWideChar WinAPI function, below is example code.

int UTF8toUTF16(const CHAR* utf8, WCHAR* utf16) {
    int len = MultiByteToWideChar(CP_UTF8, 0, utf8, -1, NULL, 0);
    if (utf16 == NULL)
        return len;
    if (len>1) {
        return MultiByteToWideChar(CP_UTF8, 0, utf8, -1, utf16, len);
    }
    return 0;
}


const CHAR* utf8str = "someutf8string";

int requiredLen = UTF8toUTF16(utf8str, nullptr);
if (requiredLen > 0) {
    std::vector<WCHAR> utf16str(requiredLen, '\0');
    UTF8toUTF16(utf8str.data(), &utf16str.front());
    // do something with data
}

if you numbers are plain ASCII then of course this conversion will do nothing, but if your requirement says text on input is in UTF8 then to be safe you should do such conversion, at least I would do it.

for further conversion look into here : atoi() with other languages

Community
  • 1
  • 1
marcinj
  • 48,511
  • 9
  • 79
  • 100