-2

I've been learning C++, and I was just starting with C, but I got this weird problem which was stopping my compiler from compiling. There were several stray '\342' and whatever errors.

Well, I now know that those are caused by using non-ASCII characters, but the solution to the rest of the posts were very simple. They were using different "" signs because they got the code through copy paste.

I just wrote both this "int main"s (I commented one block or another to test the code... also don't mind the Portuguese. It isn't relevant at all):

#include <stdio.h>
#include <stdlib.h>

/*
int main()
{
    float x;
    printf("Introduza um numero com bastantes casas decimais: \n");
    scanf("%f", &x);
    printf("Com 2 casas decimais: %.2f  \nCom todas as casas decimais %f", x, x);⁠⁠⁠⁠
    return 0;
}

*/

int main()
{
    float var;

    printf("Introduza um numero com bastantes casas decimais: \n");

    scanf("%f", &var);

    printf("Com 2 casas decimais: %.2f \nCom todas as casas decimais: %f", var, var);


}

The first block doesn't work, it gives me this errors:

||=== Build: Debug in ExplicAna (compiler: GNU GCC Compiler) ===|
/home/meneses/Cê/ExplicAna/main.c||In function ‘main’:|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\342’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\201’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\240’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\342’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\201’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\240’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\342’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\201’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\240’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\342’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\201’ in program|
/home/meneses/Cê/ExplicAna/main.c|10|error: stray ‘\240’ in program|

on this line:

printf("Com 2 casas decimais: %f  \nCom todas as casas decimais %f", x, x);⁠⁠⁠⁠

I erased that line three times, rewrote it, but nothing. The " looks exactly the same.

I then wrote the other code which is the exactly equal and to my surprise it worked!

What am I missing?

I'm worried because this is my first time compiling C and am afraid it might cause harm again in the future.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • 3
    Save the file in ANSI encoding. – 2501 Apr 05 '16 at 19:43
  • 2
    It's not unknown for copy/paste to put rogue characters into the text file which you can't see in the text editor. Solution is to retype the lines. – Weather Vane Apr 05 '16 at 19:45
  • I didn't copy paste at all. @WeatherVane How do I do that? I'm using code::blocks – Diogo Meneses Apr 05 '16 at 19:48
  • That's not what you said in the question. *"they got the code through copy paste"* – Weather Vane Apr 05 '16 at 19:50
  • @WeatherVane I copied that line from the first block into the second and it worked, so I don't think there are hidden chars – Diogo Meneses Apr 05 '16 at 19:50
  • "the rest of the posts were very simple, they " THEY, not me, I meant the people who were having the same stray errors :) – Diogo Meneses Apr 05 '16 at 19:51
  • I think you can rule out compiler bugs for a simple four-line main. – Weather Vane Apr 05 '16 at 19:53
  • Maybe use a hex-editor to search for invalid chars. – K. Biermann Apr 05 '16 at 19:54
  • @WeatherVane yeah this is leaving me really wtf! It might actually be rogue chars, before I didn't copy the "x, x);" part, now I did and changed the x to var and now it's giving me the same problem. – Diogo Meneses Apr 05 '16 at 19:55
  • Is this a question about compilers, editors, your friends or copy paste? Why not start with a new file, and type in what you need. – Weather Vane Apr 05 '16 at 19:56
  • Nope, just erased that part and rewrote it, same erros, unless the backspace isn't erasing them which I doubt it... – Diogo Meneses Apr 05 '16 at 19:56
  • 1
    Because I'm afraid I'll have the same problem in a more severe situation in the future and I wanted to find out what I'm missing, that's how I learn the best – Diogo Meneses Apr 05 '16 at 19:57
  • Delete your previous files. Start with a new file and scrap everything else. Type from scratch. Don't copy anything. If that doesn't work, get a new text editor. – Weather Vane Apr 05 '16 at 19:58
  • Did you read my post? I solved it already before making this post... thanks for your help anyway. – Diogo Meneses Apr 05 '16 at 20:02
  • I wanted to make sure I wasn't doing any synthax errors since it is my first time playing with C. – Diogo Meneses Apr 05 '16 at 20:04
  • @DiogoMeneses: Which editor did you use to write the files? Perhaps it was some setting in that editor that caused the `U+2060` to be added; perhaps, at one point, the line was long, and you did something to stop that line from being split? In general, if it was just a text editor as opposed to word processor, that should not happen.. I'm only asking because I'm so surprised about this. – Nominal Animal Apr 07 '16 at 14:44
  • The line was never long! I'm using code::blocks on Linux Mint 17.3 – Diogo Meneses Apr 07 '16 at 18:22
  • Retyping and deleting files is not necessary at all. A direct analysis is: 342 201 240 (octal) → 0xE2 0x81 0xA0 (hexadecimal) → UTF-8 sequence for Unicode code point U+2060 ([WORD JOINER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8064)). This can be searched for (and replaced) using the regular expression `\x{2060}` (the notation is different in Visual Studio Code (and probably others): `\u2060`) – Peter Mortensen Apr 28 '23 at 20:59
  • Or in other words, it is possible to stay entirely rational when it comes to this class of errors (guesswork is not required). The crucial realisation is that one Unicode character gives rise to ***three*** "stray" errors (though only two for the common U+00A0 ([NO-BREAK SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128))). The three octal numbers (sometimes decimal) can be converted to hexadecimal and the Unicode character can be searched for (as the UTF-8 sequences are usually listed in hexadecimal). – Peter Mortensen Apr 28 '23 at 21:04
  • cont' - The offending character can then be searched for (and replaced) by a regular expression in any modern text editor or IDE. There isn't any need to visually look for the character. This is impossible anyway for the invisible ones (e.g., [EM SPACE](https://www.charset.org/utf-8/9) and [ZERO WIDTH SPACE](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128)). Even the zero width space can be fixed in this manner. – Peter Mortensen Apr 28 '23 at 21:14
  • cont' - To search for the most common ones, use this regular expression (e.g., in [Geany](https://pmortensen.eu/world2/2020/03/29/using-geany/). In [Visual Studio Code](https://en.wikipedia.org/wiki/Visual_Studio_Code) the notation is different; see above): `\x{00A0}|\x{2003}|\x{2009}|\x{200B}|\x{200C}|\x{FEFF}|\x{2013}|\x{2014}|\x{2029}|\x{201C}|\x{201D}|\x{2060}|\x{2212}|\x{00E4}|\x{FFFC}|\x{FFFD}|\x{2217}|\x{200C}|\x{202B}|\x{202A}|\x{FF1A}|\x{21B5}` – Peter Mortensen Apr 28 '23 at 21:17
  • U+2060 (WORD JOINER) is in fact one of the invisible ones. But it is entirely possible to replace it using the method described here (and hopefully in the future in the canonical question). – Peter Mortensen Apr 28 '23 at 21:22
  • This is a ***very*** common error when copying code from web pages, [PDF](https://en.wikipedia.org/wiki/Portable_Document_Format) documents, through chat (e.g. [Skype Chat](https://en.wikipedia.org/wiki/Features_of_Skype#Skype_chat) or [Facebook Messenger](https://en.wikipedia.org/wiki/Facebook_Messenger)), etc. The canonical question is *[Compilation error: stray ‘\302’ in program, etc.](https://stackoverflow.com/questions/19198332)*. – Peter Mortensen Apr 28 '23 at 21:22

1 Answers1

1

After fixing the var to be x in both code lines, the second code (working) line ends with the following octets:

 x  , sp  x  )  ; lf
78 2c 20 78 29 3b 0a

However the first one (broken) ends with:

 x  , sp  x  )  ;                                     lf
78 2c 20 78 29 3b e2 81 a0 e2 81 a0 e2 81 a0 e2 81 a0 0a

in other words, sandwiched between the semicolon and the linefeed you have:

e2 81 a0 e2 81 a0 e2 81 a0 e2 81 a0

You indeed have hidden characters in your first code line after the semi-colon, but before the linefeed, which your compiler is rightly bitching about.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
WhozCraig
  • 65,258
  • 11
  • 75
  • 141
  • Thanks, I've come to realize the problem was here: ", x, x);" but I don't know how those hidden chars got there since I wrote everything, twice even. Didn't know what hexeditors were, thanks, I'll try to learn how to use them. – Diogo Meneses Apr 05 '16 at 20:06
  • 1
    For what its worth, those are the UTF-8 representation of U+2060, ["Word Joiner"](https://en.wikipedia.org/wiki/Word_joiner). I have no idea how you would get one of those into a file, never mind four of them. – rici Apr 05 '16 at 20:12
  • @rici no kidding ? Who knew? (besides you =P). My guess is there was originally one, but after several rounds of cutting, pasting, deleting visible chars, etc, they multiplied. – WhozCraig Apr 05 '16 at 20:13
  • @WhozCraig: `unicode $'\xe2\x81\xa0'` will tell you all, provided you have the `unicode` utility. – rici Apr 05 '16 at 20:14
  • I'm on a linux, does that make a difference? What is unicode for? I can install it through the terminal if I wish – Diogo Meneses Apr 05 '16 at 20:27
  • @DiogoMeneses [Unicode](https://en.wikipedia.org/wiki/Unicode) and [this additional link](http://www.joelonsoftware.com/articles/Unicode.html) are worthy of reading. – WhozCraig Apr 05 '16 at 20:31
  • U+00A0 "non-breaking space", `\302\240`, is easy to mistype in Nordic keyboard layouts, as it is generated by AltGr+Space, and AltGr is also used to type {, [, ], } and others. But I'm totally puzzled as to how you'd get U+2060, as it only prohibits line break at that point. Which keyboard layout are you using, Diogo Meneses? – Nominal Animal Apr 05 '16 at 20:31
  • This one https://upload.wikimedia.org/wikipedia/commons/thumb/2/2c/KB_Portuguese.svg/2000px-KB_Portuguese.svg.png – Diogo Meneses Apr 05 '16 at 20:34
  • Uh I used xbindkeys to be able to ctrl + alt + 0 to get same output as altgr + 0, do you think that was what caused trouble? I've used it a lot before though – Diogo Meneses Apr 05 '16 at 20:35
  • 1
    @DiogoMeneses: No, I just suspect that you have U+2060 at e.g. Ctrl+Alt+Enter or Shift+Enter or some such combination, that is easy to mistype. Install and run `xev`, and see if any such modified keypresses near `;` or Enter produce a line with `XLookupString gives 3 bytes: (e2 81a0) ""`. If you find it, it's easy to disable; you only need the `keycode` and `keysym` on the same report. – Nominal Animal Apr 05 '16 at 20:57
  • Can't find it! Nothing goes past 2 bytes. – Diogo Meneses Apr 07 '16 at 14:13