Java's split method has leading blank records that I can't suppress

Question

I'm parsing an input file that has multiple keywords preceded by a +. The + is my delimiter in a split, with individual tokens being written to an array. The resulting array includes a blank record in the [0] position.

I suspect that split is taking the "nothing" before the first token and populating project[0], then moving on to subsequent tokens which all show up as correct.

Documentaion says that this method has a limit parameter:

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

and I found this post on SO, but the solution proposed, editing out the leading delimiter (I used a substring(1) to create a temp field) yielded the same blank record for me.

Code and output appers below. Any tips would be appreciated.

import java.util.regex.*;
import java.io.*;
import java.nio.file.*;
import java.lang.*;
//
public class eadd
{
    public static void main(String args[])
    {
        String projStrTemp = "";
        String projString = "";
        String[] project = new String[10];
        int contextSOF = 0;
        int projStringSOF = 0;
        int projStringEOF = 0;
       //
        String inputLine = "foo foofoo foo foo @bar.com +foofoofoo +foo1 +foo2 +foo3";
        contextSOF = inputLine.indexOf("@");
        int tempCalc = (inputLine.indexOf("+")) ;
        if (tempCalc == -1) {
            proj StrTemp = "+Uncategorized";
        } else {
            projStringSOF = inputLine.indexOf("+",contextSOF);
            projStrTemp = inputLine.trim().substring(projStringSOF).trim();
        }
        project = projStrTemp.split("\\+");
       //
        System.out.println(projStrTemp+"\n"+projString);
        for(int j=0;j<project.length;j++) {
        System.out.println("Project["+j+"] "+project[j]);
        }
    }

CONSOLE OUTPUT: 
+foofoofoo +foo1 +foo2 +foo3

Project[0]
Project[1] foofoofoo
Project[2] foo1
Project[3] foo2
Project[4] foo3

ᴇʟᴇvᴀтᴇ · Answer 1 · 2012-08-09T12:49:45.857

2

Change:

projStrTemp = inputLine.trim().substring(projStringSOF).trim();

to:

projStrTemp = inputLine.trim().substring(projStringSOF + 1).trim();

If you have a leading delimiter, your array will start with a blank element. It might be worthwhile for you to experiment with split() without all the other baggage.

public static void main(String[] args) {
    String s = "an+example";

    String[] items = s.split("\\+");
    for (int i = 0; i < items.length; i++) {
        System.out.println(i + " = " + items[i]);
    }
}

With String s = "an+example"; it produces:

0 = an
1 = example

Whereas String s = "+an+example"; produces:

0 = 
1 = an
2 = example

edited Aug 09 '12 at 12:49

answered Aug 09 '12 at 12:44

ᴇʟᴇvᴀтᴇ

12,285
4
43
66

Thanks! worked great! But doesn't that do the same thing (more elegantly, though) as creating a separate variable defined as `ProjStrTemp.substring(1)`? If not, can you help me understand why... – dwwilson66 Aug 09 '12 at 12:51
Yes, `projStrTemp = projStrTemp.substring(1)` would do the same thing. You must remember to assign it to a variable as substring doesn't modify the string itself. – ᴇʟᴇvᴀтᴇ Aug 09 '12 at 12:55

Petr · Accepted Answer · 2012-08-09T12:56:29.550

1

One simple solution would be to remove the first + from the string. This way, it won't split before the first keyword:

projStrTemp = inputLine.trim().substring(projStringSOF + 1).trim();

Edit: Personally, I'd go for a more robust solution using regular expressions. This finds all keywords preceded by +. It also requires that + is preceded by either a space or it's at the start of the line so that words like 3+4 aren't matched.

String inputLine = "+foo 3+4 foofoo foo foo @bar.com +foofoofoo +foo1 +foo2 +foo3";
Pattern re = Pattern.compile("(\\s|^)\\+(\\w+)");
Matcher m = re.matcher(inputLine);
while (m.find()) {
    System.out.println(m.group(2));
}

edited Aug 09 '12 at 12:56

answered Aug 09 '12 at 12:50

Petr

62,528
13
153
317

Thanks also for the regex. I'm just getting started with, and am still a little intimidated by, regex. But I'll play with that! – dwwilson66 Aug 09 '12 at 13:04
@dwwilson66 Yes, regular expressions are a bit hard to read, but there are tools that can help a lot. Some of them are described in [Software for visually building regular expressions?](http://unix.stackexchange.com/questions/27023/software-for-visually-building-regular-expressions) – Petr Aug 09 '12 at 13:31

score 0 · Answer 3 · answered Aug 09 '12 at 12:55

+foofoofoo +foo1 +foo2 +foo3

Splits method splits the string around matches of the given + so the array contains in the first element an empty field (with 5 elements). If you want to get the previous data get inputLine instead the processed projStrTemp that substring from the first + included.

Java's split method has leading blank records that I can't suppress

3 Answers3