0

Bit new to character set encoding formats. I have a ant build script, that compiles my java code in ISO-8859-1 format. It was working fine.

After reading couple of articles: How do I convert between ISO-8859-1 and UTF-8 in Java?

I have changed the characterSet format to UTF-8, since then the compilation issues started.

Error thrown is:

[javac] TestEncoding.java (at line 11)
[javac] case '?' :
[javac] ^^^^^^^^

My Build script has follows:

<javac compiler="org.eclipse.jdt.core.JDTCompilerAdapter"
destdir="bin" debug="true" deprecation="on" encoding="iso-8859-1"
source="1.6" target="1.6"
debuglevel="lines,source" failonerror="false" errorProperty="buildFailed">
<compilerarg line="-warn:+raw" />
<compilerarg line="-warn:-serial" />
<compilerarg line="-log source/testapp/compileLog.xml" />
<src path="testapp" />
<classpath refid="application.classpath" />
</javac>

One of my class that is having problems has following code in it:

public class TestEncoding {
public static final String filterAccent(String s) {
    StringBuffer sb = new StringBuffer();
    int n = s.length();

    for (int i = 0; i < n; i++) {
        char c = s.charAt(i);
        switch (c) {
        case 'á':
            sb.append("a");
            break;
        case 'à':
            sb.append("a");
            break;
        case 'ã':
            sb.append("a");
            break;
        case 'À':
            sb.append("A");
            break;
        case 'â':
            sb.append("a");
            break;
        case 'Â':
            sb.append("A");
            break;
        case 'ä':
            sb.append("a");
            break;
        case 'Ä':
            sb.append("A");
            break;
        case 'å':
            sb.append("a");
            break;
        case 'Å':
            sb.append("A");
            break;
        case 'ç':
            sb.append("c");
            break;
        case 'Ç':
            sb.append("C");
            break;
        case 'é':
            sb.append("e");
            break;
        case 'É':
            sb.append("E");
            break;
        case 'è':
            sb.append("e");
            break;
        case 'È':
            sb.append("E");
            break;
        case 'ê':
            sb.append("e");
            break;
        case 'Ê':
            sb.append("E");
            break;
        case 'ë':
            sb.append("e");
            break;
        case 'Ë':
            sb.append("E");
            break;
        case 'í':
            sb.append("i");
            break;
        case 'ì':
            sb.append("i");
            break;
        case 'ï':
            sb.append("i");
            break;
        case 'î':
            sb.append("i");
            break;
        case 'Ï':
            sb.append("I");
            break;
        default:
            sb.append(c);
            break;
        }
    }
    return sb.toString();
   }
}

I have also tried to change the characterset to UTF-16, but this time it has thrown different errors:

build.xml:152: com.ibm.team.repository.common.validation.PropertyConstraintException: Validation errors for item: type = CompilePackage, itemId = [UUID _ORXiULV3Eea3M7KtSY0KHw]
    Value of attribute "compileSources.errors.sourceText" is 67854 bytes, which is greater than the allowed encoded length of 32768 bytes.
    Value of attribute "compileSources.errors.sourceText" is 58296 bytes, which is greater than the allowed encoded length of 32768 bytes.
    Value of attribute "compileSources.errors.sourceText" is 36105 bytes, which is greater than the allowed encoded length of 32768 bytes.
    Value of attribute "compileSources.errors.sourceText" is 127899 bytes, which is greater than the allowed encoded length of 32768 bytes.
    Value of attribute "compileSources.errors.sourceText" is 155844 bytes, which is greater than the allowed encoded length of 32768 bytes.
    Value of attribute "compileSources.errors.sourceText" is 120795 bytes, which is greater than the allowed encoded length of 32768 bytes.
    Value of attribute "compileSources.errors.sourceText" is 81561 bytes, which is greater than the allowed encoded length of 32768 bytes.
    Value of attribute "compileSources.errors.sourceText" is 33264 bytes, which is greater than the allowed encoded length of 32768 bytes.
    Value of attribute "compileSources.errors.sourceText" is 35163 bytes, which is greater than the allowed encoded length of 32768 bytes.
    Value of attribute "compileSources.errors.sourceText" is 96396 bytes, which is greater than the allowed encoded length of 32768 bytes.
    at com.ibm.team.repository.service.internal.RdbRepositoryDataMediator.failIfNecessary(RdbRepositoryDataMediator.java:456)
    at com.ibm.team.repository.service.internal.RdbRepositoryDataMediator.validateItem(RdbRepositoryDataMediator.java:405)

Can someone help on this?

Thanks and Regards,

Vijay Reddy.

Community
  • 1
  • 1
Vijay Reddy
  • 125
  • 2
  • 14
  • Make sure that the `encoding` attribute of the `javac` task matches the real encoding of your source files . – Arnaud Nov 28 '16 at 15:44
  • My source files character encoding is cp-1252. But using ISO-8859-1 is working fine where as UTF-8 is not. – Vijay Reddy Nov 28 '16 at 16:56
  • 1
    CP-1252 and ISO-8859-1 are very close encodings, most of the characters are represented the same way .Try to encode your source files in UTF-8, and specify UTF-8 as the `encoding` attribute . – Arnaud Nov 28 '16 at 16:59
  • @Berger Source code are compiled on RHEL server. In one of the article I saw default encoding for linux machine is UTF-8. Do we have a equivalent encoding for ISO-8859-1. Trying the option mentioned by you is a huge one. Being only a build engineer I could not do it since there are many sources with same encoding where I can't suggest developers same option. – Vijay Reddy Nov 29 '16 at 10:12
  • I didn't get what and how you changed to UTF8, is it your source code or what ?? – hagrawal7777 Nov 29 '16 at 14:31
  • encoding format was changed in ant build script javac task. It is not in the source code. – Vijay Reddy Nov 30 '16 at 10:41

1 Answers1

0

I tried multiple things on the encoding level. Nothing worked.

Finally I tried Berger suggestion, change the source code encoding format to UTF-8 and then build using the UTF-8 everything is working fine. Only thing that I had to maintain the attention is on the special characters that are used in the project. As soon as the encoding at the project level has been changed, special characters where changed to ?? symbols. I need to convert all these ?? to actual special characters. This was the only effort I need to spend on it. This could be a messy situation for a developer but as this is a one time activity for a developer/per project this should be OK.

Thanks Berger for the suggestion.

Vijay Reddy
  • 125
  • 2
  • 14