2

I am trying to separate a very large .cvs (255 columns) by using a BufferedReader that grabs each line and stores it in a string.

I'd like to be able to split it by a command and letter. Ex:

1,2,3,5,6 will split into
1 | 2 | 3 | 4 | 5 | 6 | 7

hello,world,good day to you, Sir,test will split into
hello | world | good day to you, Sir | test

notice how I only separated a comma that follows with an alphanumeric. The commas that precede a space are not separated, instead they are part of a sentence.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
Joe
  • 31
  • 4

3 Answers3

2

For each string a:
a.split(",(?=\\S)");

nldoty
  • 471
  • 3
  • 8
1

To split with a comma followed with an alpnanumeric char you may use

String pattern = ",(?=\\p{Alnum})";

Or, if you plan to support any Unicode letters, pass the Pattern.UNICODE_CHARACTER_CLASS
((?U)) option alongside the pattern:

String pattern = "(?U),(?=\\p{Alnum})";

See the RegexPlanet regex demo.

Java demo:

String s = "hello,world,good day to you, Sir,test,1,2";
String[] result = s.split(",(?=\\p{Alnum})");
for (String r:result) {
    System.out.println(r); 
}

Output:

hello
world
good day to you, Sir
test
1
2
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

In this link there is an answer that explains the use of Lookahead and Lookbehind. Here I leave a code that I believe can solve the problem you describe:

private static String[] mySplit(final String line, final char separator) {
    String regex = "((?<=(" + separator + "\\w)|(?=(" + separator + "\\w))))";
    String[] split = line.split(regex);

    List<String> list = new ArrayList<>();
    for (int i = 0; i < split.length; i++) {
        String token = split[i];
        if (token.startsWith(String.valueOf(separator))) {
            split[i + 1] = token.substring(1) + split[i + 1];
        } else {
            list.add(token);
        }
    }

    return list.toArray(new String[list.size()]);
}

private static String concatenate(final String[] tokens, final char separator){
    StringBuilder builder = new StringBuilder();
    for (int i = 0; i < tokens.length; i++) {
        builder.append(tokens[i]).append((i < tokens.length - 1) ? separator : "");
    }

    return builder.toString();
}

public static void main(String[] args) {
    final String line = "hello,world,good day to you, Sir,test";
    final String[] tokens = mySplit(line, ',');
    final String newLine = concatenate(tokens, '|');
    System.out.println("newLine = " + newLine);
}
Jorge Garcia
  • 65
  • 1
  • 7