0

I'm parsing files with or without Byte Order Mark (BOM).

CharBuffer buffer = allocateBuffer();
reader.read(buffer);
buffer.flip();

later the following method init is invoked to ignore BOM

private void init() {
    char first;

    if (buffer.hasRemaining()) {
        first = buffer.get();
        if (!isByteOrderMark(first)) {
            buffer.rewind();
        }
    }
}

well, it turned out working fine in NetBeans

Parsing file "common\name_lists\AI.txt"...
[FILL]  file_size=6878, total_line=140
[PARSE] parent=null, state=0
[NEXT]  line=1, next="AI"
    cache=2 [=, {]
[NEXT]  line=1, next="="
    cache=1 [{]
[NEXT]  line=1, next="{"
    cache=0 []

however in the console, it has the following output:

Parsing file "common/name_lists/AI.txt"...
    [FILL]  file_size=6879, total_line=140
    [PARSE] parent=null, state=0
    [NEXT]  line=1, next="�?"
        cache=2 [AI, =]
    [NEXT]  line=1, next="AI"
        cache=1 [=]
    [NEXT]  line=1, next="="
        cache=0 []
    [NEXT]  line=2, next="{"
        cache=2 [selectable, =]
    Exception in thread "main" com.stellaris.TokenException: {
        at com.stellaris.ScriptFile.handlePlainList(ScriptFile.java:269)
        at com.stellaris.ScriptFile.analyze(ScriptFile.java:109)
        at com.stellaris.ScriptFile.analyze(ScriptFile.java:57)
        at com.stellaris.ScriptFile.<init>(ScriptFile.java:50)
        at com.stellaris.ScriptFile.<init>(ScriptFile.java:45)
        at com.stellaris.ScriptFile.newInstance(ScriptFile.java:38)
        at com.stellaris.ScriptFile.main(ScriptFile.java:280)

then i decompiled the class file, it seems fine

Compiled from "ScriptParser.java"
public final class com.stellaris.ScriptParser {
  public com.stellaris.ScriptParser(java.io.Reader);
  private void init();
  private static boolean isByteOrderMark(char);
  private static java.nio.CharBuffer allocateBuffer();
}

bytecode of method init

  private void init();
    Code:
       0: aload_0
       1: getfield      #12                 // Field buffer:Ljava/nio/CharBuffer;
       4: invokevirtual #13                 // Method java/nio/CharBuffer.hasRemaining:()Z
       7: ifeq          33
      10: aload_0
      11: getfield      #12                 // Field buffer:Ljava/nio/CharBuffer;
      14: invokevirtual #14                 // Method java/nio/CharBuffer.get:()C
      17: istore_1
      18: iload_1
      19: invokestatic  #15                 // Method isByteOrderMark:(C)Z
      22: ifne          33
      25: aload_0
      26: getfield      #12                 // Field buffer:Ljava/nio/CharBuffer;
      29: invokevirtual #16                 // Method java/nio/CharBuffer.rewind:()Ljava/nio/Buffer;
      32: pop
      33: return

method init is invoked in the constructor ScriptParser(Reader reader)

  public com.stellaris.ScriptParser(java.io.Reader);
    Code:
       0: aload_0
       1: invokespecial #2                  // Method java/lang/Object."<init>":()V
       4: aload_0
       5: new           #3                  // class java/io/BufferedReader
       8: dup
       9: aload_1
      10: invokespecial #4                  // Method java/io/BufferedReader."<init>":(Ljava/io/Reader;)V
      13: putfield      #5                  // Field reader:Ljava/io/BufferedReader;
      16: aload_0
      17: new           #6                  // class java/util/LinkedList
      20: dup
      21: invokespecial #7                  // Method java/util/LinkedList."<init>":()V
      24: putfield      #8                  // Field deque:Ljava/util/LinkedList;
      27: aload_0
      28: invokespecial #9                  // Method fill:()V
      31: aload_0
      32: iconst_0
      33: putfield      #10                 // Field lineCounter:I
      36: aload_0
      37: invokespecial #11                 // Method init:()V
      40: return

as is shown, method init is invoked

first 4 characters (hex, NetBeans)

chars=feff 23 23 23

first 4 characters (hex, console)

chars=9518 fffd 23 23

javac version: 1.8.0_73

java version: 1.8.0_73

KaiserKatze
  • 1,521
  • 2
  • 20
  • 30

1 Answers1

0

I'm using the following class to avoid such conflict, thought I still don't know why.

/*
 * Copyright (C) 2016 donizyo
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
package net.donizyo.io;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import org.apache.commons.io.ByteOrderMark;
import org.apache.commons.io.input.BOMInputStream;

/**
 *
 * @author donizyo
 */
public class BOMReader extends BufferedReader {

    public static final String DEFAULT_ENCODING = "UTF-8";

    public BOMReader(File file) throws IOException {
        this(file, DEFAULT_ENCODING);
    }

    private BOMReader(File file, String encoding) throws IOException {
        this(new FileInputStream(file), encoding);
    }

    private BOMReader(FileInputStream input, String encoding) throws IOException {
        this(new BOMInputStream(input), encoding);
    }

    private BOMReader(BOMInputStream input, String encoding) throws IOException {
        super(new InputStreamReader(input, getCharset(input, encoding)));
    }

    private static String getCharset(BOMInputStream bomInput, String encoding) throws IOException {
        ByteOrderMark bom;

        bom = bomInput.getBOM();
        return bom == null ? encoding : bom.getCharsetName();
    }
}
KaiserKatze
  • 1,521
  • 2
  • 20
  • 30