0

I use Java for file reading. Here's my code:

      public static String[] fajlbeolvasa(String s) throws IOException
      {
        ArrayList<String> list = new ArrayList<>();
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(s), "UTF8"));

        while(true)
        {
        String line = reader.readLine();
        if (line == null)
        {
            break;
        }
        list.add(line);
        }
      }

However, when I read the file, then the output will be incorrect shaped. For example: "Farkasgyep\305\261". Maybe something wrong with the BOM. How can I solve this problem in Java? Will be grateful for any help.

Milky
  • 143
  • 4
  • 15
  • 1
    please, refer this link, http://stackoverflow.com/questions/14918188/reading-text-file-with-utf-8-encoding-using-java. – Vishal Gajera Oct 28 '15 at 09:51
  • Thank you, but could not solve the problem, because I do not know the character encoding. Reads the file from WireShark and the type of the file is *.pdml. – Milky Oct 28 '15 at 10:38
  • The BOM normally only occurs at the very beginning of a file. Since you are getting unexpected characters at the end of a string, your problem almost certainly has nothing to do with BOM. This means you have asked the wrong question and you are getting correct but unhelpful answers. I think your problem is that you think your file is encoded as UTF-8 but is actually encoding as something else. – k314159 Jul 07 '21 at 10:10

1 Answers1

0

You can try to check for BOM in the following way, this treat the file as byte[], you shouldn't have problem using this with your file:

private static boolean isBOMPresent(byte[] content){
    boolean result = false;

    byte[] bom = new byte[3];
    try (ByteArrayInputStream is = new ByteArrayInputStream(content)) {
        int bytesReaded = is.read(bom);

        if(bytesReaded != -1) {
            String stringContent = new String(Hex.encodeHex(bom));
            if (BOM_HEX_ENCODE.equalsIgnoreCase(stringContent)) {
                result = true;
            }
        }
    } catch (Exception e) {
        LOGGER.error(e);
    }

    return result;
}

Then, if you need to remove it you can use this:

public static byte[] removeBOM(byte[] fileWithBOM) {
    final String BOM_HEX_ENCODE = "efbbbf";
    
    if (isBOMPresent(fileWithBOM)) {
        ByteBuffer bb = ByteBuffer.wrap(fileWithBOM);

        byte[] bom = new byte[3];
        bb.get(bom, 0, bom.length);

        byte[] contentAfterFirst3Bytes = new byte[fileWithBOM.length - 3];
        bb.get(contentAfterFirst3Bytes, 0, contentAfterFirst3Bytes.length);

        return contentAfterFirst3Bytes;
    } else {
        return fileWithBOM;
    }

}
Simone Lungarella
  • 301
  • 1
  • 4
  • 15