-1

I have a file that contains Japanese characters and I want to know if the line contains of only Katakana characters without using Qtcore

trial.txt contains:

こにちわ
おはよう
ナルト

I want the program to say that the third line is all katakana characters

The file is saved as "UTF-8 Unicode text, with CRLF line terminators".

If you think this is a duplicate question, please comment the link to the same ANSWERED question.

/*
Unicode Ranges:
3040 — 309F     Hiragana
30A0 — 30FF     Katakana
*/

I am using C++, Visual Studio 2013, gcc 4.8.3 and my current code page is Unicode (UTF-8 with signature). Prefixes like u8 don't work (I don't know why, it should have worked).

R.A.
  • 101
  • 7

1 Answers1

1

I edited 2 codes I found while researching about this.

I decided to take Joachim Pileborg's advice to decode the file to UTF-32 and I used the UTF-32 decimal values to set the range

        //conversion from http://en.cppreference.com/w/cpp/locale/wstring_convert/converted
        void utf8ToUtf32(string line){
            string utf8 = line; 

            // the UTF-8 - UTF-32 standard conversion facet
            wstring_convert<codecvt_utf8<char32_t>, char32_t> cvt;

            // UTF-8 to UTF-32
            u32string utf32 = cvt.from_bytes(utf8);

            //printing of decimal val inspired by http://www.cs.ucr.edu/~cshelton/courses/cppsem/strex.cc
            cout << utf32.length() << ": ";
            for (char32_t c : utf32) {
                cout << hex  << c << ' ';
                writeFile << c << ' ';

                if (c >= 12450 && c <= 12543) cout << "k ";
            }
            cout << dec << endl;
            writeFile << dec << endl;
        }

I know that there might be other ways to do this but with the time frame I have, this is good enough.

R.A.
  • 101
  • 7