0

I'm trying to figure out how to split a file (two columns) to readLine(); by considering a lot of delimiters (see bellow).

Here are all possibilities of my delimiters (see comments)

+--------+---------+
+ ##some text      + //some text which starts with (##) I want to exclude this row
+ 341,     222     + //comma delimited
+ 211      321     + //space delimited
+ 541      1231    + //tab delimited
+ ##some text      + //some text which starts with (##) I want to exclude this row
+ 11.3     321.11  + //double values delimited by tab
+ 331.3    33.11   + //double values delimited by space
+ 231.3,   33.1    + //double values delimited by comma
+ ##some text      + //some text which starts with (##) I want to exclude this row
+--------+---------+

I want to obtain this table:

+--------+---------+
+ 341        222   + 
+ 211        321   +
+ 541        1231  +
+ 11.3      321.11 +
+ 331.3     33.11  +
+ 231.3      33.1  +
+--------+---------+

I will be glad to find a solution to this issue

UPDATE:

For now I have ([,\s\t;])+ (for comma, tab, space, semicolon...) but I can't figure out how to do for ##some text. I tried \##\w+ but didn't work. Any advice?

2 Answers2

1

You can try this...
I have tried it and its working fine.

(\\d+\\.?\\d*),?\\s*?(\\d+\\.?\\d*)

and replace with $1 and $2.

EDIT:

TRY BELOW CODE...

import java.util.regex.Pattern;
import java.util.regex.Matcher;

class regcheck
{
    private static Pattern twopart = Pattern.compile("(\\d+\\.?\\d*),?\\s*?(\\d+\\.?\\d*)");

    public static void checkString(String s)
    {
        Matcher m = twopart.matcher(s);
        if (m.matches()) {
            System.out.println(m.group(1) +" " + m.group(2));
        } else {
            System.out.println(s + " does not match.");
        }
    }

    public static void main(String[] args) {
        System.out.println("Parts of strings are ");
        checkString("##some text");
        checkString("123,     4567");
        checkString("123,   342");
        checkString("45.45   4.3");
        checkString("3.78,  23.78");

  }  
}

OUTPUT :

Parts of strings are
##some text does not match.
123 4567
123 342
45.45 4.3
3.78 23.78

m.group(1) will give you the first part.
m.group(2) will give you the second part.

In your code use checkstring() method for single line....

Pratik
  • 1,531
  • 3
  • 25
  • 57
  • I'm confusing where to put $1 and $2. I use the regex to split something. My actual regex is `String pair[] = s.split("([,\\s\\t;^])+");` (I tried to add `##[a-z\\s]+|([,\\s\\t;^])+` which @garyh suggested me, but I got `java.lang.ArrayIndexOutOfBoundsException: 0 array is empty`. – Apopei Andrei Ionut Nov 22 '12 at 11:19
  • I have found some code... try to put my given regular expression in pattern.. here it is http://stackoverflow.com/a/3483070/513340 – Pratik Nov 22 '12 at 12:22
  • 1
    Thanks Sir, I don't forget you :). – Apopei Andrei Ionut Nov 23 '12 at 14:26
0

Assuming the ASCII isn't part of the input, you could try this:

##[a-z\s]+|([\d\.]+)[,\s\t]+([\d\.]+)

then replace with:

\1   \2     (or $1    $2)

Note, this doesn't allow for commas in the numbers

garyh
  • 2,782
  • 1
  • 26
  • 28