0

I have a program that grabs some strings from a location and puts them in a list. I also have an "exclusions list" that loads from a file. If the current string is in the exclusions list, it gets ignored.

In my exclusions list file, I have this string: Something ›

Note, that is not a typical angle bracket. It's a special character (dec value 8250)

When I run this in Eclipse, everything works perfectly. My program sees that the Something › is in the exclusions list and ignores it. However, when I build and run my program as a jar, the Something › does not get ignored. Everything else works fine, it's just that one string.

I'm assuming it's because of the , which means it must be encoding related. However, I have the text file saved as UTF-8 (without BOM), and my eclipse is configured as UTF-8, too. Any ideas?

Andrio
  • 1,852
  • 2
  • 25
  • 54
  • 1
    Read up on the file encoding property of the JVM. i think in your case eclipse is handling it under the hood. You need to read the string in a OS dependent way as suggested here. http://stackoverflow.com/questions/17467394/how-to-get-unicode-in-jar-files – Som Bhattacharyya Aug 03 '15 at 14:40

1 Answers1

0

This seems to have fixed it. I changed the way it loaded the text file from:

Scanner fileIn = new Scanner(new File(filePath));

to

Scanner fileIn = new Scanner(new FileInputStream(filePath), "UTF-8");
Andrio
  • 1,852
  • 2
  • 25
  • 54
  • 1
    yup. good job. That forces the file to be read in a UTF way. :) – Som Bhattacharyya Aug 03 '15 at 14:59
  • 1
    You should almost always specify encodings for `Scanner`, `Reader`, and `Writer` classes. Especially when communicating with other systems, you want the encoding to be well-know. The only real exception I can think of is `System.in` and `System.out`, since the terminal will probably be using the default system encoding. – David Ehrmann Aug 03 '15 at 17:19