How to check for UTF-8 BOM in files in Groovy?

Question

I don't want to load the whole file into memory
I don't want to make any assumptions about the underlying OS.

I'm left with this:

echo it, "Checking file.. ${file.absolutePath}"
def fis = new FileInputStream(file)
def openingBytes = new byte[3]
try {
    fis.read(openingBytes)

    if (openingBytes.encodeHex() =~ /^efbbbf/) {
        errors << file.path + " - File needs to be converted from UTF-8 BOM to UTF-8 without BOM"
    }
} catch (Exception e) {
    errors << "Encountered an error trying to check " + file.path + " for BOMs."
} finally {
    fis.close()
}

But that seems awfully verbose and Java-like. :-(

score 2 · Accepted Answer · answered Nov 20 '13 at 20:07

2

How about:

file.withInputStream { fis ->
    byte[] openingBytes = new byte[3]
    fis.read( openingBytes )
    if( openingBytes != [ 0xEF, 0xBB, 0xBF ] as byte[] ) {
        errors << file.path + " - File needs to be converted from UTF-8 BOM to UTF-8 without BOM"
    }
}

answered Nov 20 '13 at 20:07

tim_yates

167,322
27
342
338

Definitely better :-) – Jun-Dai Bates-Kobashigawa Nov 20 '13 at 20:10

score 0 · Answer 2 · edited May 23 '17 at 10:33

0

Well, Groovy uses Java libraries, there is a Java solution for this: Apache Common IO.

You can have a look at the answer of this thread:

Reading UTF-8 - BOM marker

Link to Apache Common IO in that thread no longer works, here is the correct link:

http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html

edited May 23 '17 at 10:33

Community

1
1

answered Dec 23 '14 at 11:39

Nicole Naumann

1,018
2
10
23

How to check for UTF-8 BOM in files in Groovy?

2 Answers2