137

I'm currently working on a Java project that is emitting the following warning when I compile:

/src/com/myco/apps/AppDBCore.java:439: warning: unmappable character for encoding UTF8
    [javac]         String copyright = "� 2003-2008 My Company. All rights reserved.";

I'm not sure how SO will render the character before the date, but it should be a copyright symbol, and is displayed in the warning as a question mark in a diamond.

It's worth noting that the character appears in the output artifact correctly, but the warnings are a nuisance and the file containing this class may one day be touched by a text editor that saves the encoding incorrectly...

How can I inject this character into the "copyright" string so that the compiler is happy, and the symbol is preserved in the file without potential re-encoding issues?

seanhodges
  • 17,426
  • 15
  • 71
  • 93
  • be interested in actually knowing what bytes make up that copyright character, i.e. `hexdump AppDBCore.java` I somehow doubt its `\u00a9` and instead is something that works partially for you because of your system setup. The question mark above is _used to replace an incoming character whose value is unknown or unrepresentable in Unicode_ http://hexutf8.com/?q=c2a9efbfbd20323030332d32303038204d7920436f6d70616e792e20416c6c207269676874732072657365727665642e – jar Sep 09 '16 at 21:28

13 Answers13

104

Try with: javac -encoding ISO-8859-1 file_name.java

Fernando Nah
  • 1,041
  • 1
  • 7
  • 2
  • 1
    I like this solution. I added "-encoding UTF-8" as a compilerarg in my ant build.xml and I still get "warning: unmappable character for encoding ASCII". If I modify it to "-encoding jjjj" it won't compile, complaining "error: unsupported encoding: jjjj", so I know it is recognizing UTF-8, but it still seems to be treated .java files as ascii. Sigh. – dfrankow Jul 03 '10 at 00:13
  • 1
    I tried the "encoding" parameter of the ant javac task, same problem. It recognizes the parameter, but then ignores it somehow. – dfrankow Jul 03 '10 at 03:04
  • 24
    @dfrankow: you have to add `` under the applicable `` call in your `Build.xml` file. This is a bad way to do it, but you have no choice. See my long comment at the top. – tchrist Nov 15 '10 at 11:07
  • I had the same problem when I added the compilearg in the ant script it worked ok, I was buildin this from a windows comandline, the strange thig is that I was buildin from eclipse it warked eaven withowt the compilearg, looks like that eclipse thakes care of the encoding right. – simonC May 10 '12 at 07:46
  • This helped me :) for MAC OSX – Arun Abraham Nov 11 '12 at 08:30
  • This helped me a lot!! This website [ http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch2charset.htm ] also shows the differences between the character encoding sets, ASCII has columns 0-7, ISO-8859-1 has columns 0-7 and A-F – jp093121 Jul 30 '13 at 16:25
57

Use the "\uxxxx" escape format.

According to Wikipedia, the copyright symbol is unicode U+00A9 so your line should read:

String copyright = "\u00a9 2003-2008 My Company. All rights reserved.";
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 13
    Be careful with \uNNNN characters... they are parsed before doing lexical analysis. For example, if you put this comment /* c:\unit */ to your code, it will not compile anymore, because "nit" isn't correct hex number. – Peter Štibraný Jan 21 '09 at 11:25
  • 3
    Absolutely. (This is better handled in C#, where unicode escaping is only applied in certain contexts - but then there's the dangerous \x escape sequence as well, which is awful.) – Jon Skeet Jan 21 '09 at 11:38
  • 5
    This sounds more like a band-aid than a cure. The real problem appears to be that you're telling javac to expect source files in UTF-8 when they're really in a single-byte encoding like ISO-8859-1 or windows-1252. – Alan Moore Jan 27 '09 at 01:31
  • 7
    @Alan M: In my experience, it's a lot easier to make sure you won't have a problem by keeping source files in ASCII than it is to make sure you use the right encoding *everywhere* your source might be compiled (Ant, Eclipse, IDEA etc). – Jon Skeet Jan 27 '09 at 06:38
  • The old Seven-bit Solution, sure. But character-encoding problems crop up in many other contexts, too. Every developer has to have a pretty good grasp of the issues involved. – Alan Moore Jan 28 '09 at 07:08
  • @Alan: Yes, every developer should know about encodings etc. That doesn't mean it's a good idea to cause problems where you don't have to. I prefer portable code where you don't *have* to choose the encoding (as almost everything is ASCII-friendly). – Jon Skeet Jan 28 '09 at 07:54
  • 6
    @Jon, that’s a fundamental flaw in Java; the fact that the Java source unit is encoded in UTF-8, ISO 8859-1, CP1252, MacRoman, or whatever, is treated at metadata external to the source unit that needs it. This forces you to remember to fix your ant file or Eclipse config, etc. As you rightly point out, this is absolutely the worst way to do it, because the info is fragile and easily lost. Languages that keep the metadata (encoding metadata) and the data (read: source code) together in one place are much more robust at this. It’s the only sane approach. – tchrist Nov 15 '10 at 11:04
51

If you're using Maven, set the <encoding> explicitly in the compiler plugin's configuration, e.g.

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>2.3.2</version>
            <configuration>
                <encoding>UTF-8</encoding>
            </configuration>
        </plugin>
Thomas Leonard
  • 7,068
  • 2
  • 36
  • 40
  • This is the right approach if people are using maven to build their project, thanks for sharing. – Shamik Jul 18 '12 at 17:32
  • 3
    The javadoc plugin will also complain about the unmappable character. It's preferable to set the `project.build.sourceEncoding` property. – Emmanuel Bourg Jan 14 '14 at 07:34
  • i was using already the project.build.sourceEncoding property, but somehow it didn't map properly into the compiler encoding property. Setting it explicitly did the trick – Federico Bonelli Oct 13 '14 at 12:53
39

This helped for me:

All you need to do, is to specify a envirnoment variable called JAVA_TOOL_OPTIONS. If you set this variable to -Dfile.encoding=UTF8, everytime a JVM is started, it will pick up this information.

Source: http://whatiscomingtomyhead.wordpress.com/2012/01/02/get-rid-of-unmappable-character-for-encoding-cp1252-once-and-for-all/

nightlyop
  • 7,675
  • 5
  • 27
  • 36
  • wow it works I just add this to my .bashrc , and it fixed my problem . – cowboi-peng Jan 31 '18 at 09:37
  • Worked great, from command line I entered to build: `javac MyJavaFile.java -encoding utf-8 -cp .;lib\*` Then when running it, I didn't need to add that extra encoding part. – Azurespot Feb 04 '20 at 02:45
31

put this line in yor file .gradle above the Java conf.

apply plugin: 'java'
compileJava {options.encoding = "UTF-8"}   
Alobes5
  • 463
  • 5
  • 8
12

Most of the time this compile error comes when unicode(UTF-8 encoded) file compiling

javac -encoding UTF-8 HelloWorld.java

and also You can add this compile option to your IDE ex: Intellij idea
(File>settings>Java Compiler) add as additional command line parameter

enter image description here

-encoding : encoding Set the source file encoding name, such as EUC-JP and UTF-8.. If -encoding is not specified, the platform default converter is used. (DOC)

Alupotha
  • 9,710
  • 4
  • 47
  • 48
12

Gradle Steps

If you are using Gradle then you can find the line that applies the java plugin:

apply plugin: 'java'

Then set the encoding for the compile task to be UTF-8:

compileJava {options.encoding = "UTF-8"}   

If you have unit tests, then you probably want to compile those with UTF-8 too:

compileTestJava {options.encoding = "UTF-8"}

Overall Gradle Example

This means that the overall gradle code would look something like this:

apply plugin: 'java'
compileJava {options.encoding = "UTF-8"}
compileTestJava {options.encoding = "UTF-8"}
Luke Machowski
  • 3,983
  • 2
  • 31
  • 28
4

This worked for me:

<?xml version="1.0" encoding="utf-8" ?>
<project name="test" default="compile">
    <target name="compile">
        <javac srcdir="src" destdir="classes" encoding="iso-8859-1" debug="true" />
    </target>
</project>
Yuri
  • 4,254
  • 1
  • 29
  • 46
Dxx0
  • 41
  • 1
2

For those wondering why this happens on some systems and not on others (with the same source, build parameters, and so on), check your LANG environment variable. I get the warning/error when LANG=C.UTF-8, but not when LANG=en_US.UTF-8.

jakar
  • 1,031
  • 1
  • 11
  • 22
1

If you use eclipse (Eclipse can put utf8 code for you even you write utf8 character. You will see normal utf8 character when you programming but background will be utf8 code) ;

  1. Select Project
  2. Right click and select Properties
  3. Select Resource on Resource Panel(Top of right menu which opened after 2.)
  4. You can see in Resource Panel, Text File Encoding, select other which you want

P.S : this will ok if you static value in code. For Example String test = "İİİİİııııııççççç";

bora.oren
  • 3,439
  • 3
  • 33
  • 31
  • 1
    Your description of “You will see normal [a] utf8 character when you [are] programming but [the] background will be utf8 code” makes no sense. Also, see my long comment in response to the question above. – tchrist Nov 15 '10 at 11:14
  • I changed it to ISO-8859-1, but still got a compile error about "unmappable character for encoding UTF8". – pacoverflow Mar 10 '17 at 18:39
1

I had the same problem, where the character index reported in the java error message was incorrect. I narrowed it down to the double quote characters just prior to the reported position being hex 094 (cancel instead of quote, but represented as a quote) instead of hex 022. As soon as I swapped for the hex 022 variant all was fine.

1

If one is using Maven Build from the command prompt one can use the following command as well:

                    mvn -Dproject.build.sourceEncoding=UTF-8
5122014009
  • 3,766
  • 6
  • 24
  • 34
0

In our case adding -Dfile.encoding=UTF-8 to the ./gradlew test ... command line fixed the problem.

imy
  • 416
  • 1
  • 5
  • 12