1

I have a problem....

I made a java program that does the following:

BufferedReader  input = new BufferedReader(new FileReader("test.csv"));
String line = input.readLine();
int lenghtOfLine=line.length();
char[] lineIndex=new char[lenghtOfLine];
lineIndex=line.toCharArray();

Now i make some checks in a for loop such us if(lineIndex[i]=='|') or 'M' and some other checks in the same way...

The problem is that allthought the program run correct on windows 7, vista , xp (english and greek) when i try to run it on windows vista (German) it seems like the check lineIndex[i]=='|' is always false** why this happen? The test.csv file is the same.. and i am sure that '|' exists in every line..

Is there a problem with unicode or something??

how can i make this program run in every language

The test.csv file is always the some downloaded from the web

I am sorry for my English. Thanks in advance..

M1llaaN0
  • 13
  • 2

1 Answers1

2

The API specifies that FileReader will assume that the default character encoding of the machine on which it runs.

If you knew the CSV was UTF-8 encoded you could try:

FileInputStream fis = new FileInputStream("test.csv");
InputStreamReader isr = new InputStreamReader(fis, "UTF-8"); 
BufferedReader input = new BufferedReader(isr);
Mark McLaren
  • 11,470
  • 2
  • 48
  • 79
  • when i did that, the program stoped working properly on windows xp, vista (Greek).. Is there a way to know the encoding of the csv file? May i use instead of '|' the '\u007c' Is there any difference? – M1llaaN0 Jul 04 '11 at 02:16
  • use cygwin with the tools "file" to check for the file type and encoding and "dos2unix" & "unix2dos" to switch between line ending encodings and such. Don't you love it when stackoverflow provides you the answer for reencoding text files using cygwin? http://stackoverflow.com/questions/64860/best-way-to-convert-text-files-between-character-sets – Maarten Bodewes Jul 04 '11 at 02:54
  • Hi M1llaaN0, could you explain what you mean by "stopped working properly"? Have you tried converting into UTF-8 format (using something like NotePad++)? – Mark McLaren Jul 04 '11 at 14:12
  • by stopped working properly i mean that the check `lineIndex[i]=='|'` is always false while '|' exists in every line of csv file I know how to convert the encode with notepad ++ but i want to send the program to a friend (who has german windows) and he should have nothing to do except run the program The main question is why works on windows english and doesn't work on german windows? – M1llaaN0 Jul 04 '11 at 15:36
  • I'm confused. How are you testing this? Are you e-mail the files, passing them via a website - could something be altering the files between the different systems? – Mark McLaren Jul 04 '11 at 21:12
  • its a file downloaded from the internet... the only change is that my friend change it from 566674.csv to test.scv but the same i make too... I run the program on windows 7 german on vmware player and hasn't any problem but on my friends pc with windows vista german `lineIndex[i]=='|'` was always false I m really messed up thanks for trying to help me.. – M1llaaN0 Jul 04 '11 at 22:23
  • I used the method getEncoding and in my system print cp1253 while in my friends system cp1252. My program use the code you suggested with UTF-8 The '|' is the same in both encodings? The last thing i tried was to let the user give the character from keyboard but still not working.. For example the check is `lineIndex[i]=='c'` char c; is taken from keyboard – M1llaaN0 Jul 04 '11 at 22:42
  • This is a tricky one. This article is very good: http://illegalargumentexception.blogspot.com/2009/05/java-rough-guide-to-character-encoding.html It mentions that there is a CharsetDetector detector for Java (part of ICU4J) that you might be able to use to detect the encoding of your file (although it can never be 100% reliable). http://site.icu-project.org/ – Mark McLaren Jul 04 '11 at 23:35