0

I got some troubles trying to execute following code:

#include <iostream>
#include <regex>

int main(int argc, char** argv) {

    std::wstring buffer; // Buffer string for input

    std::wregex integerRegex(L"^-?[0-9]+$"); // Regex for integers (123, -123, etc.)

    while (true) {

        std::wcout << L"Enter your value:\n";
        std::wcin >> buffer; // Input string from keyboard to determinate is it integer or not

        // Check if integer or not
        if (regex_match(buffer, integerRegex)) {

            std::wcout << L"Integer!\n";

        } else {

            std::wcout << L"Unknown :(\n";

        }
    }

    return 0;
}

This code should output Integer! if entered sequence is integer or Unknown :( if not. But in some case I got false-positive results: When I enter something like: -234а, where а is cyrillic character - the code above say's it's integer, but it's not. Other cyrillic characters are not making such troubles.

Compiler is TDM-GCC 5.1.0 Compiled with following flags:

-std=c++11 -w -Wall -Wextra -pedantic -Werror -pg -pipe

Can someone explain what is the root of problem and who's wrong?

Cœur
  • 37,241
  • 25
  • 195
  • 267
PRIGORYAN
  • 23
  • 7
  • 4
    This may be a clue: http://ideone.com/P8R3L4 – Oliver Charlesworth Aug 14 '16 at 11:30
  • You don't check for errors after reading the input. How do you know it contains valid data? – Galik Aug 14 '16 at 11:31
  • @OliverCharlesworth any clue on why it doesn't record the `a`? – Thomas Ayoub Aug 14 '16 at 11:35
  • 1
    I'm afraid I don't know - I'm not super-familiar with wide-character behaviour. However, it does mean you can simplify your question! (Nothing to do with regexes, etc.) – Oliver Charlesworth Aug 14 '16 at 11:35
  • On my system when I add error checking the `OS` reports `"Invalid or incomplete multibyte or wide character"`. My guess is you need to convert to wide-char from a multibyte character set (likely `utf-8`). Otherwise you may be getting a wide char version of `utf-8`. – Galik Aug 14 '16 at 11:39
  • @OliverCharlesworth yes, you are right. I forgot to add that this code is not reacting on single cyrillic 'a' and waiting for other input. But what's the problem? Why is this happening? – PRIGORYAN Aug 14 '16 at 11:42
  • @Galik so you say the problem might be in console? Btw, I'm trying to execute this on Win XP. Later will try on Win 10 – PRIGORYAN Aug 14 '16 at 11:44
  • @Galik and why it work OK for other cyrillic chars? – PRIGORYAN Aug 14 '16 at 11:46
  • @PRIGORYAN I can't comment on `Windows` systems. I can fix the problem on `Linux` by converting from `UTF-8` to `UCS` wide characters. – Galik Aug 14 '16 at 11:49
  • @Galik sollution from mweerden worked fine. But thanks you too for ideas. I'm trying too find standard way to do it on C++ allover supported platforms – PRIGORYAN Aug 14 '16 at 17:22
  • @mweerden's solution works for me too. I suspect though (I cant test this everywhere) that it may be more portable to use `std::setlocale(LC_ALL, "");` rather than changing the console settings. That should (I think) select the correct locale stream converters for whatever the current console is using. But I could be wrong :) – Galik Aug 14 '16 at 17:32
  • @Galik anyway this code is just a test. I'm trying to write parser for my future master's degree and it should use files. So it should not be problem (I think) :D – PRIGORYAN Aug 14 '16 at 17:42

1 Answers1

2

It seems wcin is trying to read the input as ASCII. The non-ASCII characters cause it to get into an exception state. Adding something like the following should solve it:

std::setlocale(LC_ALL, "C.UTF-8");

Or on Windows:

SetConsoleCP(CP_UTF8);

Here is some more information: What most correct way to set the encoding in C++?

However, as mentioned by someone in the above post, you shouldn't really be modifying the locale like that. Instead you should be working with whatever locale is set. To use this information, you can use:

std::setlocale(LC_ALL, "");
Community
  • 1
  • 1
mweerden
  • 13,619
  • 5
  • 32
  • 32
  • Added your suggestion at the beginning of main(...) and nothing changed on XP. But here it finally worked: http://ideone.com/5PKiFq Anyway thanks. On modern systems it will work I think. I will test this on Win 10 later. – PRIGORYAN Aug 14 '16 at 12:13
  • @PRIGORYAN It might only work on *nix systems. I've added some more information for Windows. – mweerden Aug 14 '16 at 12:18
  • actually I want to find cross-platform solution. Something from C++ standard, not for current system. So as I understand, the solution is manual parsing of input strings before using regex – PRIGORYAN Aug 14 '16 at 12:23
  • @PRIGORYAN I think using setlocale should be pretty cross-platform if you use the right locales: https://msdn.microsoft.com/en-us/library/x99tb11d.aspx That said, I added my answer to suggest that you shouldn't explicitly set it, but use the user's locale. – mweerden Aug 14 '16 at 12:44
  • Okey, got it. I planning to use files instead of input in future so it should not be problem. Much thanks for your help. – PRIGORYAN Aug 14 '16 at 13:12