How to deal with BOM in InputStream

Question

Possible Duplicate:
Byte order mark screws up file reading in Java

public Collection<String> getLines(String path) throws SftpException
{
   BufferedReader reader = null;
   try
   {
      reader = new BufferedReader(new InputStreamReader(get(path)));

      Collection<String> result = new ArrayList<String>();
      String line;
      while((line = reader.readLine()) != null)
      {
         result.add(line);
      }
      return result;
   }
   catch (IOException e)
   {
      throw new SftpException("Could not get lines from '"+path+"'.", e);
   }
   finally
   {
      if(reader != null)
         try
         {
            reader.close();
         }
         catch (IOException e)
         {
            throw new SftpException("Failed to close stream", e);
         }
   }
}

I use the above method to get all the lines in a file located on an SFTP server. The get(path) method returns the file content as an InputStream. In my particular case the file is a CSV with a number of grouped orders. To check if a line is an order or the header of a new group, I do line.startsWith("HDR").

My problem is that I suddenly discovered that my code skips over the first header line. When I stepped through in the debugger, I discovered that the first line in my collection actually has some weird character before the HDR part. I suspect that it is a UTF-8 BOM or something like that. So, how do I deal with this? How do I read in a UTF-8 file correctly? Is there a way I can check if it in fact is a UTF-8 file?

Update: Found a solution in Byte order mark screws up file reading in Java, so closing this :)

This SO question may have a solution to your problem: http://stackoverflow.com/questions/1835430/byte-order-mark-screws-up-file-reading-in-java — Andreas Dolk, May 23 '11 at 08:41

How to deal with BOM in InputStream

0 Answers0