0

I have a situation where i need to read CSV file with fixed width bytes.

Below is my sample csv format

 ABCD  EF日本      3456    0
 ABCD  EF感じ日本 9345    1

I need

AB,CD,,EF,日本,3456,,0
AB,CD,,EF,感じ日本,9345,,1

Issue is since Japanese characters are multibyte it takes 2 bytes for each character and hence the fixed width logic does not apply.

Can some guide how this could be achieved using java or is there any standard CSV library I can use for the same.

Any script or library referred would be appreciated.

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
Time App
  • 29
  • 3
  • 1
    Possible duplicate of [What's the best way of parsing a fixed-width formatted file in Java?](http://stackoverflow.com/questions/1609807/whats-the-best-way-of-parsing-a-fixed-width-formatted-file-in-java) – Ani Menon May 01 '16 at 11:52
  • @Menon , thanks for pointing reference to solution proposed , i tried with them but failed for multibyte characters ,hence re-requested explaining input format type as well – Time App May 01 '16 at 12:57
  • Why can't you read in the file, convert the bytes to characters first and then refer to the character positions? Then it doesn't matter how many bytes are used for a character in the first place. – vanje May 01 '16 at 13:10
  • What are "fixed width bytes"? Japanese characters aren't guaranteed to be 2 bytes in any Unicode encoding. – 一二三 May 01 '16 at 13:17
  • @TimeApp I think you got your answer, accept the answer. Read - [What should I do when someone answers my question?](http://stackoverflow.com/help/someone-answers) – Ani Menon May 01 '16 at 18:54

1 Answers1

0

Sample program which splits a string by fixed width(you may add file reading & commas to it):

public class SplitStringIntoFixedSizeChunks {

    public static String[] Split(String text, int chunkSize, int maxLength) { 
        char[] data = text.toCharArray();       
        int len = Math.min(data.length,maxLength);
        String[] result = new String[(len+chunkSize-1)/chunkSize];
        int linha = 0;
        for (int i=0; i < len; i+=chunkSize) {
            result[linha] = new String(data, i, Math.min(chunkSize,len-i));
            linha++;
        }
        return result;
    }

    public static void main(String[] args) { 
        String x = "ABCD EF日本 3456 0 ABCD EF感じ日本 9345 1";
        //To Print length //System.out.println("x length: "+x.length());
        String[] lines = Split(x, 2, x.length());
        for (int i=0; i < lines.length; i++) {
            System.out.print(lines[i]);
            System.out.print( ",");
        }
    }
}

Output:

AB,CD, E,F日,本 ,34,56, 0, A,BC,D ,EF,感じ,日本, 9,34,5 ,1,

Note : If you get "error: unmappable character for encoding ASCII" run export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8

Ani Menon
  • 27,209
  • 16
  • 105
  • 126