2

I am getting XML file which is ucs-2 encoded. I want to convert this encoding to either UTF-8 or UTF -16 or ANSI using java code.

Could you please help in this?

Kushal Karia
  • 97
  • 1
  • 9
  • Firstly you'd have to define what you mean by "ANSI" given that that's no single encoding. Next, have you tried anything? I'd personally load the file with an XML parser, then look for options to specify the encoding when saving it... – Jon Skeet Sep 23 '16 at 08:39
  • 1
    Possible duplicate of [Encoding conversion in java](http://stackoverflow.com/questions/229015/encoding-conversion-in-java) – Martin Nyolt Sep 23 '16 at 12:18
  • This may be of assistance: http://stackoverflow.com/questions/229015/encoding-conversion-in-java – David J Eddy Sep 23 '16 at 13:33

1 Answers1

0

I had to do something similar and this is what I came up with (I removed a couple of methods but this should be enough for your use case). BTW, as far as I know UCS-2 may be identical to UTF-16 (provided the byte order is the same)

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.List;



enum EncodingType { 
    UTF8(0),
    UTF16BE(1), 
    UTF16LE(2), 
    ISO_8859_1(3),
    ISO_8859_2(4),
    UNKNOWN(5);
    private final int val;
    EncodingType(int val){ 
        this.val= val;
    }
    public int getIntValue(){
        return val;
    }
};

public class TextConverter{

    public  EncodingType encodingType;
    private EncodingType inputEncoding = EncodingType.UTF8;
    private EncodingType outputEncoding = EncodingType.UTF8;

    public final static String[] encodingNames = { "UTF-8","UTF-16BE","UTF-16LE", "ISO-8859-1","ISO-8859-2", "UNKNOWN" };

//the check methods are only required for querying file encodings but    don't fully rely on them because not all encodings have header bytes and you can change encoding on a file
    private final static boolean checkUTF8(byte[] header){

        return ((header[0]&0xFF)==0xEF && (header[1]&0xFF)==0xBB && (header[2]&0xFF)==0xBF)?true:false;
    }
    private final static boolean checkUTF16BE(byte[] header){

        return ((header[0]&0xFF)==0xFE && (header[1]&0xFF)==0xFF)?true:false;
    }
    private final static boolean checkUTF16LE(byte[] header){

        return ((header[0]&0xFF)==0xFF && (header[1]&0xFE)==0xFE)?true:false;
    }
    public EncodingType getInputEncoding(){
        return inputEncoding;
    }
    public EncodingType getOutputEncoding(){
        return outputEncoding;
    }
    public void setInputEncoding(EncodingType enc){
        this.inputEncoding = enc;
    }
    public void setOutputEncoding(EncodingType enc){
        this.outputEncoding = enc;
    }

    /**
     * writes a file from a string using the encoding specified in outputEncoding member variable
     * @param fileName
     * @param content
     * @throws IOException
     */
    public void writeFile(String fileName, String content)throws IOException{
        BufferedWriter bw=null;
        try {
            File file = new File(fileName);

            bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), encodingNames[outputEncoding.getIntValue()])) ;
            bw.write(content);


        }
        catch(Exception e){
            System.out.println(e);

        }finally  {
            if(bw!=null)
                bw.close();
        }
    }
    /**
     * this method reads a file and converts it to a string using the encoding specified in inputEncoding member variable
     * use the setInputEncoding(EncodingType ) to set the encoding
     * @param fileName
     * @return
     * @throws IOException
     */
    public  String readFile(String fileName) throws IOException{

        String fileContent="";
            String del =  System.getProperty("line.separator");

        BufferedReader br=null;                   

        String encoding = encodingNames[inputEncoding.getIntValue()];
        try {
            File file = new File(fileName);
            br = new BufferedReader(new InputStreamReader(new FileInputStream(file), encoding)) ;
            String line = null;

            for ( line = br.readLine(); line != null; line = br.readLine())
            {
                fileContent+=(line+del);
            }
        }
        catch(Exception e){
            System.out.println(e);
        }finally  {
            if(br!=null)
                br.close();
        }
        /*String converted = convertToAllowedChars(fileContent);


        System.out.println("FILE CONTENT");   
        System.out.println(fileContent);*/

        return fileContent;

    }

}

You can also make all the members static because that would probably make more sense for you. Of course, you can modify this code in any way you find suitable.

dsp_user
  • 2,061
  • 2
  • 16
  • 23
  • Hello, Thank you for your reply.I am getting ava.io.UnsupportedEncodingException: UCS-2LE error. i am using java 6. What is the other solution to fix this? – Kushal Karia Sep 23 '16 at 10:15
  • You should use either UTF-16LE or UTF-16BE (if you watch my code closely you see that I'm using strings (defined in encodingNames), not enum EncodingType, – dsp_user Sep 23 '16 at 10:17
  • Hello, I have kept input encoding as UTF-16LE and output encoding as UTF-8. File got generated but if open this file in notepad++ then encoding is still UCS-2LE – Kushal Karia Sep 23 '16 at 11:11
  • Check the input and output files with some kind of hex editor (to see if they're different). You can't rely on Notepad++ , or any other text editor for that matter, to reliably determine the encoding. Also, try to open the generated file in Notepad++ using UTF-8 to see if the content comes up OK. – dsp_user Sep 23 '16 at 14:26