I have the following code that extracts tab-separated strings into a string array:
static public List<String> getContents(File aFile, String separator){
// all strings, split based on separator
List<String> contentList = new ArrayList<String>();
StringTokenizer tokenizer = new StringTokenizer(Util.getContents(aFile), separator);
while (tokenizer.hasMoreTokens()){
contentList.add(tokenizer.nextToken());
}
return contentList;
}
The separator in this case is therefore a "\t".
As long as two strings are separated by one tab, everything is great. However, my dataset sometimes has two strings between separated by two tabs. This means that one parameter is missing and an emptry string shoulid be added to the list. However the method ignores that and just returns an array with one string less.
In my particular case, I always want an array of 5 strings back. That means, a text containing only 4 tabs with no text returns an array of 5 empty strings (needed for a parsing job that is based on that). Unfortunately, I have no control over the content and I am working with millions of files that are generated out of my control.
Is there a better way to do this with StringTokenizer ? Or do I have to implement something on my own?
Here some examples:
String ok = a\tb\tc\td\te String nok = a\tb\tc\t\te
Ralf