34

I would like to parse entire file based on all the possible delimiters like commas, colon, semi colons, periods, spaces, hiphens etcs.

Suppose I have a hypothetical string line "Hi,X How-how are:any you?" I should get output array with items Hi,X,How,how,are,any and you.

How do I specify all these delimiter in String.split method?

Thanks in advance.

Hovercraft Full Of Eels
  • 283,665
  • 25
  • 256
  • 373
Umesh K
  • 13,436
  • 25
  • 87
  • 129

1 Answers1

37

String.split takes a regular expression, in this case, you want non-word characters (regex \W) to be the split, so it's simply:

String input = "Hi,X How-how are:any you?";
String[] parts = input.split("[\\W]");

If you wanted to be more explicit, you could use the exact characters in the expression:

String[] parts = input.split("[,\\s\\-:\\?]");
Mark Elliot
  • 75,278
  • 22
  • 140
  • 160
  • why the or-ing operator in that expression above? Are they necessary? – Hovercraft Full Of Eels Sep 20 '11 at 23:06
  • @Hovercraft - no, but for me it's easier to read, so that's what I go with. – Mark Elliot Sep 20 '11 at 23:08
  • @Mark Does this \W regex consider number as non word character? What if I want to allow number? – Umesh K Sep 20 '11 at 23:13
  • 1
    My own preference is to show a newbie the regex without the unnecessary clutter. YMMV. – Hovercraft Full Of Eels Sep 20 '11 at 23:13
  • 3
    @UmeshKacha: please have a look at the tutorial section on this: [predefined character classes](http://download.oracle.com/javase/tutorial/essential/regex/pre_char_classes.html). Shoot the whole tutorial is worthwhile. Then when done with this one, graduate to [this one](http://www.regular-expressions.info/tutorial.html) which is my favorite. – Hovercraft Full Of Eels Sep 20 '11 at 23:15
  • 1
    @MarkElliot I'm fairly sure that character class will match the pipe (`|`) character, rather than treating them as "or" instructions. This isn't really a problem in this case, but you should be aware that it's not doing what you think it's doing. It could cause confusion and bugs in more complicated code. – Samir Talwar Sep 20 '11 at 23:24
  • I just came across this while working on a project. I ran it without the [] and it ran just fine (I was running s.split(" |\\n") ). What are the brackets supposed to do? Are they necessary? – JR Smith Mar 29 '13 at 22:36