Java: StringTokenizer does not respect separator

Question

I have the following code that extracts tab-separated strings into a string array:

static public List<String> getContents(File aFile, String separator){
     // all strings, split based on separator
     List<String> contentList = new ArrayList<String>();
     StringTokenizer tokenizer = new StringTokenizer(Util.getContents(aFile), separator);
     while (tokenizer.hasMoreTokens()){
        contentList.add(tokenizer.nextToken());
     }
     return contentList;
}

The separator in this case is therefore a "\t".

As long as two strings are separated by one tab, everything is great. However, my dataset sometimes has two strings between separated by two tabs. This means that one parameter is missing and an emptry string shoulid be added to the list. However the method ignores that and just returns an array with one string less.

In my particular case, I always want an array of 5 strings back. That means, a text containing only 4 tabs with no text returns an array of 5 empty strings (needed for a parsing job that is based on that). Unfortunately, I have no control over the content and I am working with millions of files that are generated out of my control.

Is there a better way to do this with StringTokenizer ? Or do I have to implement something on my own?

Here some examples:

String ok = a\tb\tc\td\te String nok = a\tb\tc\t\te

Ralf

score 0 · Answer 1 · edited May 23 '17 at 12:11

0

Found this: How to split a string in Java

and that I can do it with

"myString".split("\t", -1);

to obtain the empty strings if there are multiple separators custering in one place.

Thanks anyway!

edited May 23 '17 at 12:11

Community

1
1

answered Mar 24 '14 at 16:14

RalfB

563
1
7
22

Java: StringTokenizer does not respect separator

1 Answers1