4

I want to Parse the lines of a file Using parsingMethod

test.csv

 Frank George,Henry,Mary / New York,123456
,Beta Charli,"Delta,Delta Echo
", 25/11/1964, 15/12/1964,"40,000,000.00",0.0975,2,"King, Lincoln ",Alpha

This is the way i read line

 public static void main(String[] args) throws Exception {


        File file = new File("C:\\Users\\test.csv");
        BufferedReader reader = new BufferedReader(new FileReader(file));   
        String line2;
        while ((line2= reader.readLine()) !=null) {
            String[] tab = parsingMethod(line2, ",");
            for (String i : tab) {
                System.out.println( i );
            }
        }


    }

    public static String[] parsingMethod(String line,String parser) {

        List<String> liste = new LinkedList<String>();
        String patternString ="(([^\"][^"+parser+ "]*)|\"([^\"]*)\")" +parser+"?";
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher =pattern.matcher(line);

        while (matcher.find()) {
            if(matcher.group(2) != null){
                liste.add(matcher.group(2).replace("\n","").trim());
            }else if(matcher.group(3) != null){
                liste.add(matcher.group(3).replace("\n","").trim());
            }       
        }

        String[] result = new String[liste.size()];
        return liste.toArray(result);
    }
}

Output :

Frank George
Henry
Mary / New York
123456

Beta Charli
Delta
Delta Echo
"
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King
Lincoln
"
Alpha
Delta
Delta Echo

I want to remove this " , Can any one help me to improve my Pattern.


Expected output

Frank George
Henry
Mary / New York
123456
Beta Charli
Delta
Delta Echo
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King
Lincoln
Alpha
Delta
Delta Echo

Output for line 3

25/11/1964
15/12/1964

40
000
000.00


0.0975
2

King
Lincoln
GameBuilder
  • 1,169
  • 4
  • 31
  • 62

3 Answers3

2

Your code didn't compile properly but that was caused by some of the " not being escaped.

But this should do the trick:

String patternString = "(?:^.,|)([^\"]*?|\".*?\")(?:,|$)";
Pattern pattern = Pattern.compile(patternString, Pattern.MULTILINE);

(?:^.,|) is a non capturing group that matches a single character at the start of the line

([^\"]*?|\".*?\") is a capturing group that either matches everything but " OR anything in between " "

(?:,|$) is a non capturing group that matches a end of the line or a comma.

Note: ^ and $ only work as stated when the pattern is compiled with the Pattern.MULTILINE flag

CloudyMarble
  • 36,908
  • 70
  • 97
  • 130
B8vrede
  • 4,432
  • 3
  • 27
  • 40
  • I am not in Patterns. Now i have corrected my code()see question. I put your PatternString in my code. It gives me Error ` java.lang.IndexOutOfBoundsException: No group 2` – GameBuilder May 15 '13 at 10:10
  • This pattern returns it one group at the time so there is no group 2. To check if there is a group to check out `matcher.groupCount()` – B8vrede May 15 '13 at 10:20
  • And if you really want to have multiple group use this: `String patternString = "(?:(?:^.,|)([^\"]*?|\".*?\")(?:,|$))+";` (I didn't test this one but i should work) – B8vrede May 15 '13 at 10:21
  • @B8rede : Not working same erroe java.lang.IndexOutOfBoundsException: No group 2 – GameBuilder May 15 '13 at 10:32
  • Use `if(matcher.groupCount() >= 2){ liste.add(matcher.group(2).replace("\n","").trim()); }else if(matcher.groupCount() >= 3){ liste.add(matcher.group(3).replace("\n","").trim()); }` it will check if there is a group 2 and if it's there use it. Same for 3. – B8vrede May 15 '13 at 10:56
  • @B8rede : But the groupCount is one , So there is nothing in the liste. – GameBuilder May 15 '13 at 11:19
1

I can't reproduce your result but I'm thinking maybe you want to leave the quotes out of the second captured group, like this:

"(([^\"][^"+parser+ "]*)|\"([^\"]*))\"" +parser+"?"

Edit: Sorry, this won't work. Maybe you want to let any number of ^\" in the first group as well, like this: (([^,\"]*)|\"([^\"]*)\"),?

anana
  • 1,461
  • 10
  • 11
  • I'm sorry but I don't understand what you're saying. If my solution didn't work I'm sure someone else will bother to spoonfeed it to you. – anana May 15 '13 at 10:34
  • I request you to check my code. with the patternString in my code i cannot Parse " . I want to remove " also . And with your PatternString my Output is like , , ,000.00 , – GameBuilder May 15 '13 at 10:42
  • The problem is I can't reproduce your result. Can you output line3 as it is passed to parserMethod? What if you change it to this `(([^,\"]*)|\"([^\"]*)\"),?` ? – anana May 15 '13 at 10:59
  • That makes no sense to me. Try to output line 3 so I can reproduce your results. – anana May 15 '13 at 11:14
  • I don't mean the output, but the input, the String that is given as an argument to parseMethod(). – anana May 15 '13 at 11:25
  • ", 25/11/1964, 15/12/1964,"40,000,000.00",0.0975,2,"King, Lincoln ",Alpha This is string for which the output is in edited section of auestion – GameBuilder May 15 '13 at 11:26
1

As i can see the lines are related so try this:

    public static void main(String[] args) throws Exception {

        File file = new File("C:\\Users\\test.csv");
        BufferedReader reader = new BufferedReader(new FileReader(file));
        StringBuilder line = new StringBuilder();
        String lineRead;
        while ((lineRead = reader.readLine()) != null) {
            line.append(lineRead);
        }
        String[] tab = parsingMethod(line.toString());
        for (String i : tab) {
            System.out.println(i);
        }


    }

    public static String[] parsingMethod(String line) {

        List<String> liste = new LinkedList<String>();
        String patternString = "(([^\"][^,]*)|\"([^\"]*)\"),?";
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(line);

        while (matcher.find()) {
            if (matcher.group(2) != null) {
                liste.add(matcher.group(2).replace("\n", "").trim());
            } else if (matcher.group(3) != null) {
                liste.add(matcher.group(3).replace("\n", "").trim());
            }
        }

        String[] result = new String[liste.size()];
        return liste.toArray(result);
    }

Ouput:

Frank George
Henry
Mary / New York
123456
Beta Charli
Delta,Delta Echo
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King, Lincoln
Alpha

as Delta, Delta Echo is in a quotation this should appear in the same line ! like as King, Lincoln

  • Your PatternString and my PatternString is Same . this is a output only when you consider file content as a String. You have to read the file line by line and send it to parserMethod. – GameBuilder May 15 '13 at 12:22
  • Sorry mate but you told that the quotation mark on the beggining of line 3 id the close to the quotation started at line 2, so it tells me that the lines are related. If not you don't know what you want !!! – Ricardo Cacheira May 15 '13 at 14:36
  • Yes, the beginning of line 3 id the close to the quotation started at line 2. but it's line 3 and i have to parse each line separately.That is why I reading each line and parsing it using the method. Sorry for not being clear. Hope now you understand and will help me to solve this. – GameBuilder May 15 '13 at 16:14
  • It's not making sense to me, but ok ! Just to make sense, can you tell what is that lines and what you want to do? I'll try to help – Ricardo Cacheira May 15 '13 at 23:23
  • what have produced lines in test.csv ? – Ricardo Cacheira May 16 '13 at 01:25