1

Following text data given i am experiencing strange capturing group behavior. When i try to iterate over all tables only the last row of data. Is there a way to maintain all capturing groups not only the last row (values of each table)?

I am using this pattern (?<tabname>\S+)\n\=*\n(?:(\d+)\ *\|\ *(\d+)\n)+

TABLE1
=======
1  | 2
15 | 2
3  | 15

TABLE2
=======
3  | 5
12 | 2
17 | 7

Edit: Sorry for my inconsistent question, here my expected and actual outputs:

Expected output would be:

Match 1 of 2:

Group "tabname":    TABLE1
Group 2:    1
Group 3:    2
Group 4:    15
Group 5:    2
Group 6:    3
Group 7:    15

Match 2 of 2:

Group "tabname":    TABLE2
Group 2:    3
Group 3:    5
Group 4:    12
Group 5:    2
Group 6:    17
Group 7:    7

But actual output is:

Match 1 of 2:

Group "tabname":    TABLE1
Group 2:    3
Group 3:    15

Match 2 of 2:

Group "tabname":    TABLE1
Group 2:    17
Group 3:    7
calaedo
  • 313
  • 1
  • 3
  • 15
  • What is the regex flavor/language? Did you mean you have something like [`(?\S+)\n\S*\n(?:(\d+)\s*\|\s*(\d+)(?:$|\n))*`](https://regex101.com/r/zY6pS1/2)? Note that in Java, repeated capture groups always are re-written, and the last one is only kept. – Wiktor Stribiżew May 24 '16 at 10:56
  • @WiktorStribiżew Java – calaedo May 24 '16 at 10:57
  • what are you trying to capture? – rock321987 May 24 '16 at 10:57
  • @rock321987 Tablename with all the data as numbered capturing groups – calaedo May 24 '16 at 10:58
  • *with all the data as numbered capturing groups* - could you please add the exact expected output to the question, please? – Wiktor Stribiżew May 24 '16 at 10:59
  • if I am correct, you can use **[(?s)(?:(TABLE\d+)|\G)(?:(?!TABLE).)+?(\d+)\s+\|\s+(\d+)](https://regex101.com/r/aR6wR7/2)** – rock321987 May 24 '16 at 11:04
  • all groups cannot be numbered separately until it is known the number of groups beforehand – rock321987 May 24 '16 at 11:07
  • @rock321987: With multiple blocks of text, `\G` is not a nice solution (though some programming logic could help). Calaedo, what if Result 1 is `[TABLE1, 1, 2, 15, 2, 3, 15]` as an array of strings? – Wiktor Stribiżew May 24 '16 at 11:13
  • @calaedo: *why should the number of groups should be known before?* - see my first comment, you would need to explicitly add capturing groups in the pattern. – Wiktor Stribiżew May 24 '16 at 11:15
  • @WiktorStribiżew I would not mind, as long as the tablename is in the same array (Needed for unit based batch processing) – calaedo May 24 '16 at 11:17
  • @WiktorStribiżew can you guide me the problem I may face??I am fairly new in using `\G` and still trying to understand it completely – rock321987 May 24 '16 at 11:20

2 Answers2

2

I believe you can use this regex

(?s)(?:(TABLE\d+)|\G)(?:(?!TABLE).)+?(\d+)\s+\|\s+(\d+)

Regex Demo

With a bit of Java help, you can achieve the result

String line = "TABLE1\n=======\n1  | 2\n15 | 2\n3  | 15\n\nTABLE2\n=======\n3  | 5\n12 | 2\n17 | 7";
String pattern = "(?s)(?:(TABLE\\d+)|\\G)(?:(?!TABLE).)+?(\\d+)\\s+\\|\\s+(\\d+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
int flag = 0;

while (m.find()) {
    if (m.group(1) != null) {
        flag = 0;
    }

    if (flag == 0) {
        System.out.println(m.group(1) + "\n" + m.group(2) + "\n" + m.group(3));
        flag = 1;
    } else {
        System.out.println(m.group(2) + "\n" + m.group(3)); 
    }
}

Ideone Demo

rock321987
  • 10,942
  • 1
  • 30
  • 43
  • 1
    Just FYI: a tempered greedy token is very resource-consuming, and Java regex engine is prone to stack overflow issue with complex patterns (even short ones, but with quantified alternation groups). Yes, the code looks cleaner, but you would still need to declare the arrays/lists for the results. – Wiktor Stribiżew May 24 '16 at 11:28
  • @WiktorStribiżew thanks for the info..still working on my regex – rock321987 May 24 '16 at 11:29
2

You can collect your data in 2 passes. The first regex will just match the tables with all the values:

"(?<tabledata>\\S+)\\s+\\S+(?<vals>[|\\d\\s]+)"

See demo. Next, we'll just match the numbers and add them to the string array (with the simple \d+ regex).

Here is a full Java demo producing [[TABLE1, 1, 2, 15, 2, 3, 15], [TABLE2, 3, 5, 12, 2, 17, 7]]:

import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.*;

class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String s = "TABLE1\n=======\n1  | 2\n15 | 2\n3  | 15\n\nTABLE2\n=======\n3  | 5\n12 | 2\n17 | 7"; 
        Pattern pattern = Pattern.compile("(?<tabledata>\\S+)\\s+\\S+(?<vals>[|\\d\\s]+)");
        Matcher matcher = pattern.matcher(s);
        List<List<String>> res = new ArrayList<>();
        while (matcher.find()){
            List<String> lst = new ArrayList<>();
            if (matcher.group("tabledata") != null) {
                lst.add(matcher.group("tabledata"));
            }
            if (matcher.group("vals") != null) {
                Matcher m = Pattern.compile("\\d+").matcher(matcher.group("vals"));
                while (m.find()) {
                    lst.add(m.group(0));
                }
            }
            res.add(lst);
        } 
        System.out.println(res); 
    }
}
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • haha..that's how I used to think before I read(a bit) about `\G`..i know you don't want to complicate things..+1 – rock321987 May 24 '16 at 11:28
  • 1
    @rock321987: `\G` based solution of yours is also valid. Just unroll the tempered greedy quantifier (see [how it can be done here](http://stackoverflow.com/a/37343088/3832970)) – Wiktor Stribiżew May 24 '16 at 11:30
  • 1
    Just `(?:(TABLE\d++)|\G)[^T\d]*+(?:T(?!ABLE\d)[^T\d]*+)*(\d+)` should be enough. Java also supports possessive quantifiers. – Wiktor Stribiżew May 24 '16 at 12:11
  • The point is in using negated character classes (with a negative lookeahead in most cases) so that all the subsequent subpatterns could not match the same character at the same location. Some unrolled patterns can be just used with the help of smart grouping and quantifiers. – Wiktor Stribiżew May 24 '16 at 12:20
  • seems like I have a lot to read today..will read when reach home and ask if there is a problem..thanks again – rock321987 May 24 '16 at 12:24