3

I'm reading user input with fgets() and I'm checking if there are some non allowed symbols.

If user types "š" for example, I will notice it, because value of "š" is higher then 127. But when user types "ασδφ" or "жщдф", my code won't work, because these symbols are completely ignored and replaced by "?".

My code:

char input[100];
fgets(input, 100, stdin);
for (int i = 0; i < strlen(input) - 1; i++)
{
    /// Check, if input[i] is ASCII symbol
}

When user types "š", in variable input will be "š". But when user types "щ", int variable input will be "?" and question mark is valid ASCII character.

How to fix it?

EDIT:

Operating system: Windows 10

IDE: Visual Studio 2015

Code:

for (size_t i = 0; i < strlen(input); i++)
{
    printf("%c %d\n", input[i], input[i]);
    if (input[i] < 0/* || input[i] > 127*/)
    {
        error = 4;
        break;
    }
}

If I pause a program, content of array input for user input "ασδφ" is 63, 63, 63, 63, 10.

EDIT 2:

Now I'm totaly confused. I tried compiling and running on Ubuntu, everything worked fine. But on Windows it is still replacing non ASII symbols with questions marks. Any idea how to get it work on Windows?

Edward Grey
  • 101
  • 3
  • How are you running your code? – melpomene Mar 16 '19 at 12:32
  • 2
    Do you mean that `fgets` writes them to your buffer (the array `input`) as `'?'`? If you look at the contents byte by byte in a debugger, the actual value is the ASCII code for `'?'`? I really find that hard to believe. What operating system are you using? What kind of terminal are you running the program in? What's your local-settings? – Some programmer dude Mar 16 '19 at 12:35
  • "my code won't work," and "input will be "?" " --> post your code that does the printing. – chux - Reinstate Monica Mar 16 '19 at 12:52
  • The type `char` can be either signed *or* unsigned, it's implementation (compiler) specific. That means `input[i] < 0` will be wrong if `char` is unsigned. – Some programmer dude Mar 16 '19 at 13:14
  • Which compiler `Visual Studio 2015` is using? – Saurabh Mar 16 '19 at 15:07
  • Please read up on [Unicode](https://en.wikipedia.org/wiki/Unicode), particularly the representations of [utf-8](https://en.wikipedia.org/wiki/UTF-8) and [utf-16](https://en.wikipedia.org/wiki/UTF-16). I think Microsoft's tools are biased toward the latter. – wallyk Mar 16 '19 at 15:17
  • Try including every header file required by functions used in your program. – Saurabh Mar 16 '19 at 15:17
  • I think your _terminal_ is set to non-UTF-8; the `?` character is a stand in for something unrecognised. The byte sequence is non-ASCII; test it with `isascii`. – Neil Mar 16 '19 at 21:11
  • When you say ASCII, you must mean the C0 Controls and Basic Latin characters that are in many character sets. That you are asking about other characters, you must be assuming that the program will be run with a different character set and encoding. You and your users and their systems should be in alignment about how that works before interpreting the character codes with your program code. – Tom Blodget Mar 16 '19 at 22:06

3 Answers3

3

I think you should use isascii(int ch) function defined in ctype.h header:

#include <ctype.h>

char input[100];
fgets(input, 100, stdin);
for (int i = 0; i < strlen(input) - 1; i++)
{
    if (isascii((int)input[i]) {
        /* If ASCII */
    } else {
       /*If non-ASCII */
    }
}
1

You can also use this :

#include <stdio.h>

main(){
char input[100];
fgets(input, 100, stdin);
for (int i = 0; i < strlen(input) - 1; i++)
{
    if(input[i]<128 && input[i]>0)
        printf("\nASCII Value");
    else
        printf("\nNot an ASCII Value");
}
}
Saurabh
  • 43
  • 1
  • 15
0

It is hard to tell from the details you've provided, but I do not think the problem is in your code. When you mentioned that it works on Ubuntu, that hints that you are experiencing an encoding issue with your console.

If stdin is a file handle this shouldn't be an issue, but it sounds like you are trying to use stdin from the command line and copy/pasting input. Windows command prompt will convert unicode characters to '?' if they are not supported by your console font. See this question and the accepted answer for more information:

What encoding/code page is cmd.exe using?

Ryan
  • 112
  • 1
  • 6