Possible Duplicate:
Byte order mark screws up file reading in Java
public Collection<String> getLines(String path) throws SftpException
{
BufferedReader reader = null;
try
{
reader = new BufferedReader(new InputStreamReader(get(path)));
Collection<String> result = new ArrayList<String>();
String line;
while((line = reader.readLine()) != null)
{
result.add(line);
}
return result;
}
catch (IOException e)
{
throw new SftpException("Could not get lines from '"+path+"'.", e);
}
finally
{
if(reader != null)
try
{
reader.close();
}
catch (IOException e)
{
throw new SftpException("Failed to close stream", e);
}
}
}
I use the above method to get all the lines in a file located on an SFTP server. The get(path)
method returns the file content as an InputStream
. In my particular case the file is a CSV with a number of grouped orders. To check if a line is an order or the header of a new group, I do line.startsWith("HDR")
.
My problem is that I suddenly discovered that my code skips over the first header line. When I stepped through in the debugger, I discovered that the first line in my collection actually has some weird character before the HDR
part. I suspect that it is a UTF-8 BOM or something like that. So, how do I deal with this? How do I read in a UTF-8 file correctly? Is there a way I can check if it in fact is a UTF-8 file?
Update: Found a solution in Byte order mark screws up file reading in Java, so closing this :)