0

I inherited a pretty large C code base that uses Unicode identifiers widely (UTF-8 encoded).

I have to build it with GCC (any version is fine, but I can't recompile it), and I get a lot of error: stray ‘\302’ in program (763 of these errors).

I cannot simply remove all the Unicode characters because of name collisions. I cannot use iconv because a certain amount of characters get translated to ? that is not valid inside an identifiers (and obviously I cannot replace all ? to a different character because the code base also use ternary operator widely).

Is there a better way than manually edit every single Unicode character? The best solution would be automatic and reversible (just in case GCC finally start supporting Unicode properly).

Note: I'm aware of the current GCC limitations and of the patch provided, but I cannot rebuild GCC. Moreover what I'm asking for is an automated way to replace all Unicode characters at once (in a revertible way, if possible), not a way to preserve them.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Giacomo Tesio
  • 7,144
  • 3
  • 31
  • 48
  • 1
    http://stackoverflow.com/questions/7899795/is-it-possible-to-get-gcc-to-compile-utf-8-with-bom-source-files – Mitch Wheat Jan 26 '16 at 23:49
  • unicode identifiers: just don't do it! – chqrlie Jan 27 '16 at 00:24
  • @nneonneo this is not a duplicate: I'm aware of the current GCC limitations and of the patch provided, but I cannot rebuild gcc. Moreover what I'm asking for is an automated way to replace all unicode characters at once (in a revertible way, if possible), not a way to preserve them, like the other question you pointed out. – Giacomo Tesio Jan 27 '16 at 08:06
  • If you convert your identifiers to UCN, as in the accepted answer there, and use `-fextended-identifiers`, then you will be able to compile your code without any risk of name clashes, and without having to recompile GCC. Does that not work for you? – nneonneo Jan 27 '16 at 19:17
  • @nneonneo I feel a bit dumb, but if I have to manually change each identifier to UCN, I can adopt a more readable convention, instead. Indeed I do not know any `sed`, `iconv` or `vim` script that can **automate** the conversion to UCN (or to anything else). Am I missing something so obvious that does not deserve an answer? – Giacomo Tesio Jan 27 '16 at 20:32
  • The answer at the duplicate question gives a script which uses `perl` to perform the conversion automatically. – nneonneo Jan 27 '16 at 23:24

0 Answers0