I've looked around for answers to this (I'm sure they're out there), and I'm not sure it's possible.
So, I got a HUGE file that contains the word "för". I'm using RandomAccessFile because I know where it is (kind of) and can therefore use the seek() function to get there.
To know that I've found it I have a String "för" in my program that I check for equality. Here's the problem, I ran the debugger and when I get to "för" what I get to compare is "för".
So my program terminates without finding any "för".
This is the code I use to get a word:
private static String getWord(RandomAccessFile file) throws IOException {
StringBuilder stb = new StringBuilder();
String word;
char c;
c = (char)file.read();
int end;
do {
stb.append(c);
end = file.read();
if(end==-1)
return "-1";
c = (char)end;
} while (c != ' ');
word = stb.toString();
word.trim();
return word;
}
So basically I return all the characters from the current point in the file to the first ' '-character. So basically I get the word, but since (char)file.read(); reads a byte (I think), UTF-8 'ö' becomes the two characters 'Ã' and '¶'?
One reason for this guess is that if I open my file with encoding UTF-8 it's "för" but if I open the file with ISO-8859-15 in the same place we now have exactly what my getWord method returns: "för"
So my question:
When I'm sitting with a "för" and a "för", is there any way to fix this? Like saying "read "för" as if it was an UTF-8 string" to get "för"?