0

On somewhat of a lark, I wanted to use a µ character in a function name in a C project. Is this just not possible? I get errors like

error: stray '\302' in program

I tried adding the options:

-fexec-charset=UTF-8
-finput-charset=UTF-8

to my build script, but I must not understand what those enable. I'm running this version of gcc:

arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors 6-2017-q2-update) 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Travis Griggs
  • 21,522
  • 19
  • 91
  • 167
  • 1
    https://gcc.gnu.org/wiki/FAQ#What_is_the_status_of_adding_the_UTF-8_support_for_identifier_names_in_GCC.3F (includes workaround of sorts) – rici Jun 02 '18 at 03:05
  • Related: *[ (and other Unicode characters) in identifiers not allowed by g++](https://stackoverflow.com/questions/12692067/and-other-unicode-characters-in-identifiers-not-allowed-by-g)* – Peter Mortensen May 01 '23 at 03:07

3 Answers3

2

The C standard requires implementations to have the following characters in the source character set:

A-Z a-z 0-9 ! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~

as well as space, characters for horizontal tab, vertical tab, and form feed. It also requires some method of indicating the end of the line, although that is not necessarily an in-stream character (C 2011 [N1570] 5.2.1 3). Implementations may extend this character set, and they may permit other characters in identifiers, but such extensions are defined by each implementation, not the standard.

-finput-charset=… does not specify what character set to use for the source character set. It specifies what the character set of the source input is, but that input is translated to GCC’s source character set.

Clang appears to accept µ as an identifier (tested on macOS and at Compiler Explorer), while GCC does not.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • [It is claimed](https://stackoverflow.com/questions/12692067/and-other-unicode-characters-in-identifiers-not-allowed-by-g/42158646#42158646) to be accepted by GCC 10.1 (2020-05-07) and later. – Peter Mortensen May 01 '23 at 03:12
1

The name of a function must include only alphanumeric characters and underscores on most implementations of the C language.

1

I tried this (UTF-8 encoded) program on my Mac under two different compilers:

#include <stdio.h>

double π = 3.141592654;

int main()
{
    printf("π = %f\n", π);
}

As others have reported, GCC complained about a "stray ‘\317’ in program". But Clang accepted it and compiled it successfully; when I run the resulting program, I get

π = 3.141593

Bottom line: It's implementation-defined, I think.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Steve Summit
  • 45,437
  • 7
  • 70
  • 103