I am having a csv file which has some fields having chinese character strings. Unfortunately i dont know what is encoding of this input csv file. I am trying to read this input csv and using selective fields from it, i am making a html and another csv file as output.
While reading csv input, i tried all encoding from list http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html which have Chinese mentioned in their description. And found if I use
InputStreamReader read = new InputStreamReader(filepath,"GB18030");
for reading csv and
OutputStreamWriter osW=new OutputStreamWriter(objBufferedOutputStream,"UTF-16");
For writing html and csv, my output doesnt show weird characters.
But, there are 2 problems:
- The output is showing strings which are altogether different from input ! I mean, even when im not doing any processing on any string from my code, the output is not found in any field of input csv.
For example, my input has a chinese char string: 陈真珍 on field number 8. but my output html has something like: 闄堢湡鐝� which corresponds to input field number 8.
- as u can see, there is a questionmark, i.e. replacement char from unicode in output 闄堢湡鐝�
I request you to kindly help me trace where can be a mistake here...
PS: Aiso, I checked Google translation and found,input string 陈真珍 means some Chen Zhen Zhen
and its corresponding output string 闄堢湡鐝� means something called as Yaobaoyujue So there is difference in meaning as well as representation of characters also.