4

Simple thing: I want to be able to break builds via checkstyle, if selected files (.java, .xml) are not encoded properly (I want to enforce UTF-8 in source files).

I'm currently using checkstyle for a number of other build-breakers, like enforcing correct LineFeeds and/or usage of Tab Characters, but there does not seem to be something like a FileEncodingChecker.

Question: If checkstyle simply can't do this: is there another plugin that might do this job?

A_Di-Matteo
  • 26,902
  • 7
  • 94
  • 128
Robert Heine
  • 1,820
  • 4
  • 29
  • 61

1 Answers1

3

Maven encoding (sources and resources) is handled by the standard project.build.sourceEncoding property, which indeed should be present and set to the UTF-8 value, as a good practice.
From official documentation of the maven-resources-plugin

The best practice is to define encoding for copying filtered resources via the property ${project.build.sourceEncoding} which should be defined in the pom properties section

This property is picked up as default value of the encoding property of the maven-compiler-plugin and the encoding property of the maven-resources-plugin.


To further enforce its presence, you could then use the maven-enforcer-plugin and its requireProperty rule, in order to enforce the existence of the project.build.sourceEncoding property and its value at UTF-8. That is, the build would fail if the property was not set AND did not have this exact value.

Below an example of such a configuration, to add to your pom.xml file, build/plugins section:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-enforcer-plugin</artifactId>
    <version>1.4.1</version>
    <executions>
        <execution>
            <id>enforce-property</id>
            <goals>
                <goal>enforce</goal>
            </goals>
            <configuration>
                <rules>
                    <requireProperty>
                        <property>project.build.sourceEncoding</property>
                        <message>Encoding must be set and at UTF-8!</message>
                        <regex>UTF-8</regex>
                        <regexMessage>Encoding must be set and at UTF-8</regexMessage>
                    </requireProperty>
                </rules>
                <fail>true</fail>
            </configuration>
        </execution>
    </executions>
</plugin>

Note, the same could be done for the project.reporting.outputEncoding property.


Further reading on Stack Overflow:


Bonus: since we are on Stack Overflow, the CEO would probably be happy to see his old article back again: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets


Test
Given the following Java code:

package com.sample;

public class Main {

    public void 漢字() {
    }

}

and setting the following in Maven:

<properties>
    <project.build.sourceEncoding>US-ASCII</project.build.sourceEncoding>
</properties>

Would actually make the build fail, since US-ASCII is 7 bits and woudl result in illegal character errors. The same would not happen for UTF-8, which makes uses of 8 bits instead.

Community
  • 1
  • 1
A_Di-Matteo
  • 26,902
  • 7
  • 94
  • 128
  • @a-di-matteo: Thanks for this extensive answer, works perfectly for me. I'm still feeling a bit unhappy about using two plugins for achieving some (IMHO) simple build breaking scenario. – Robert Heine Jun 14 '16 at 07:19
  • I am tried replacing regex/properties with `ISO-8859-1` also added plugin in root pom. As i want to enforce this encoding on my project, it didn't work. I am having multi-module project, am i missing something – Amol Bais Apr 12 '23 at 15:02