1

I am having CSV file, need to process it. While reading the CSV file, for the first line and first character I am seeing an unknown special character. Not sure why getting this and how to resolve.

Here is the code snippet I am using,

CSVReader reportTypesReader = new CSVReader(new FileReader(Paths.get(filePath.concat("/ReportTypes.csv")).toFile()));

String\[\] nextLine;
// reads one line at a time
while ((nextLine = reportTypesReader.readNext()) != null) {
for (String token : nextLine) {
System.out.print(token);
}
System.out.print("\\n");
}

Here is the sample output,

  Report Type, Icon URL

My expectation is,

Report Type, Icon URL
purush
  • 349
  • 3
  • 5

1 Answers1

2

BOM

Yes, that is a BOM (Byte Order Mark).

 are the characters assigned to the three octets that make up a BOM in UTF-8, if misinterpreted as code points. In decimal, those three octets are 239 187 191. In hex, EF BB BF.

You can try using BOMInputStream - https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html

There must be some signature of the CSV Reader that accepts InputStream. You can use new BOMInputStream(new FileInputStream(File)) This will create a wrapper over FileInputStream object, and BOMInputStream will make sure you get the content without the BOM. This should work with any kind of file, with or without BOM in it.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Ishan
  • 400
  • 2
  • 8
  • And if there is no signature that accepts an InputStream, you can use ```InputStreamReader reader = new InputStreamReader(new BOMInputStream(new FileInputStream(File)));``` – Ishan Apr 17 '23 at 14:35
  • The 3 first symbols you see are codes: \u00ef\u00bb\u00bf and it is definitely BOM. You can search for "\u00ef\u00bb\u00bf" or BOM and find tons of info on it – Michael Gantman Apr 17 '23 at 14:47