250

I need to split a string base on delimiter - and .. Below are my desired output.

AA.BB-CC-DD.zip ->

AA
BB
CC
DD
zip 

but my following code does not work.

private void getId(String pdfName){
    String[]tokens = pdfName.split("-\\.");
}
weston
  • 54,145
  • 21
  • 145
  • 203
Thang Pham
  • 38,125
  • 75
  • 201
  • 285
  • Based on what you said, it looks like it is working fine. What is your desired output? – Jeff May 13 '11 at 14:59
  • 4
    @Jeff: He showed his desired output (`AA` / `BB` / `CC` ...) – T.J. Crowder May 13 '11 at 15:02
  • 2
    Are you sure? I interpreted that as his current output, not his desired output. Maybe its time to stand up and walk around a little bit. – Jeff May 13 '11 at 15:04
  • 1
    @Jeff: Sorry for the confusion, I updated my post to clear your misunderstand. – Thang Pham May 13 '11 at 15:05
  • Regex will degrade your performance. I would recommend write a method which will go character by character and split string if need. You can optimize this futher to get log(n) performance. – Princesh Feb 16 '13 at 17:55

15 Answers15

360

I think you need to include the regex OR operator:

String[]tokens = pdfName.split("-|\\.");

What you have will match:
[DASH followed by DOT together] -.
not
[DASH or DOT any of them] - or .

Ahmed Nabil
  • 17,392
  • 11
  • 61
  • 88
Richard H
  • 38,037
  • 37
  • 111
  • 138
  • 11
    why we require two backslashes ?? – pjain Feb 21 '16 at 13:16
  • 10
    The `.` character in regex means any character other than new line. http://www.tutorialspoint.com/java/java_regular_expressions.htm In this case, however, they wanted the actual character `.`. The two backslashes indicate that you are referring to `.`. The backslash is an escape character. – Monkeygrinder Feb 21 '16 at 19:25
  • 5
    for normal cases it would be `.split("match1|match2")`, (eg. `split("https|http")`), \\ is to escape the special char `.` in above case – prayagupa Sep 14 '18 at 22:17
  • or generally, you can use `pdfName.split("\\W");` as below @Peter Knego answer – Ahmed Nabil Apr 10 '19 at 21:08
  • 5
    use `[-.]` instead of `-|\\.` – Saeed Jul 04 '19 at 06:01
69

Try this regex "[-.]+". The + after treats consecutive delimiter chars as one. Remove plus if you do not want this.

Peter Knego
  • 79,991
  • 11
  • 123
  • 154
  • 9
    @Lurkers: The only reason Peter didn't have to escape that `-` was that it's the *first* think inside the `[]`, otherwise there would need to be a backslash in front of it (and of course, to put a backslash in front of it, we need *two* because this is a string literal). – T.J. Crowder May 13 '11 at 18:32
  • 2
    I think this answer is better than the accepted one, because when you use the logical operator |, the problem is that one of your delimiters can be a part of your result 'tokens'. This will not happen with Peter Knego's [-.]+ – Jack' Jan 03 '18 at 16:05
32

You can use the regex "\W".This matches any non-word character.The required line would be:

String[] tokens=pdfName.split("\\W");
Varun Gangal
  • 429
  • 4
  • 3
  • it doesn't work for me ` String s = "id(INT), name(STRING),". Using \\W here creates an array of length 6 where as it should be only 4 – user3527975 Mar 02 '15 at 03:25
  • 2
    This will also break when the input contains Unicode character. It's best to only include the actual delimiter, instead of a "grab all" with `\W`. – nhahtdh Oct 07 '15 at 07:23
18

The string you give split is the string form of a regular expression, so:

private void getId(String pdfName){
    String[] tokens = pdfName.split("[-.]");
    // ...
}

That means "split on any character within the []" (so, split on - and .). A couple of notes on that:

  1. Normally, you have to escape the dot (.) by putting a backslash in front of it because in a regular expression . means "any character." But you don't have to do that within a character class ([]).
  2. Normally, within a character class ([]), you have to escape the dash (-) because in that context it has special meaning (it indicates a range, like [0-9A-Fa-f] to match all hex digits). But when it's the first character after the [, we don't have to escape it.

If you did need to escape either of those, the way you'd do it is by having a backslash in front of it in the string. Since we're writing this as a string literal, to actually put a backslash in the string requires that we escape it, since otherwise it's an escape character (for instance, \n means newline, \t means tab, etc.). So we'd have to write \\ to put an actual backslash in the string for the regular expression engine to see it and use it to escape the next character (- or .). For instance, "[\\-.]" if we wanted to escape the - even though we don't need to.

Live example: https://ideone.com/PMA8d3

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • You don't need to escape the hyphen in this case, because `[-.]` couldn't possibly be interpreted as a range. – Alan Moore May 13 '11 at 15:40
  • 1
    @Alan: Because it's the very first thing in the class, that's quite true. But I always do, it's too easy to go back later and add something in front of it without thinking. Escaping it costs nothing, so... – T.J. Crowder May 13 '11 at 18:31
  • do you know how to escape the brackets? I have String "[200] Engineering" that I want to split into "200" , "Engineering" – scottysseus Jul 30 '13 at 21:03
  • 3
    Oh wow I got it...I had to use two backslashes instead of one. `String[] strings = codes.get(x).split("\\[|\\]| ");` <-- code for anyone interested – scottysseus Jul 30 '13 at 21:05
  • Can you explain why we need to "escape the backlash because this is a string?" – Goh-shans Mar 27 '23 at 04:17
  • @Goh-shans - I didn't phrase that well (I've updated it). In a string *literal*, a backslash is an escape character (`\n` means newline, `\t` means tab, etc.). So to put a backslash in the string, we have to escape it (with a backslash). So `"[\\-.]"` creates a regular expression using the string `[\-.]`, which escapes the `-`. **But**, there's no need to do that when the `-` is the first character after the `[` anyway. – T.J. Crowder Mar 27 '23 at 08:11
16

Using Guava you could do this:

Iterable<String> tokens = Splitter.on(CharMatcher.anyOf("-.")).split(pdfName);
ColinD
  • 108,630
  • 30
  • 201
  • 202
9

For two char sequence as delimeters "AND" and "OR" this should be worked. Don't forget to trim while using.

 String text ="ISTANBUL AND NEW YORK AND PARIS OR TOKYO AND MOSCOW";
 String[] cities = text.split("AND|OR"); 

Result : cities = {"ISTANBUL ", " NEW YORK ", " PARIS ", " TOKYO ", " MOSCOW"}

ÖMER TAŞCI
  • 546
  • 5
  • 9
6

pdfName.split("[.-]+");

  • [.-] -> any one of the . or - can be used as delimiter

  • + sign signifies that if the aforementioned delimiters occur consecutively we should treat it as one.

Trying
  • 14,004
  • 9
  • 70
  • 110
4

I'd use Apache Commons:

import org.apache.commons.lang3.StringUtils;

private void getId(String pdfName){
    String[] tokens = StringUtils.split(pdfName, "-.");
}

It'll split on any of the specified separators, as opposed to StringUtils.splitByWholeSeparator(str, separator) which uses the complete string as a separator

Edd
  • 8,402
  • 14
  • 47
  • 73
3
String[] token=s.split("[.-]");
TylerH
  • 20,799
  • 66
  • 75
  • 101
Nitish
  • 39
  • 3
  • 12
    Please help fighting the misunderstanding that StackOverflow is a free code-writing service, by augmenting your code-only answer with some explanation. – Yunnosch Jun 25 '19 at 17:48
2

It's better to use something like this:

s.split("[\\s\\-\\.\\'\\?\\,\\_\\@]+");

Have added a few other characters as sample. This is the safest way to use, because the way . and ' is treated.

Pritam Banerjee
  • 17,953
  • 10
  • 93
  • 108
2

Try this code:

var string = 'AA.BB-CC-DD.zip';
array = string.split(/[,.]/);
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Reaper
  • 402
  • 3
  • 13
  • 2
    Please help fighting the misunderstanding that StackOverflow is a free code-writing service, by augmenting your code-only answer with some explanation. – Yunnosch Jun 25 '19 at 17:49
1

You may also specified regular expression as argument in split() method ..see below example....

private void getId(String pdfName){
String[]tokens = pdfName.split("-|\\.");
}
bummi
  • 27,123
  • 14
  • 62
  • 101
1
s.trim().split("[\\W]+") 

should work.

pleft
  • 7,567
  • 2
  • 21
  • 45
sss
  • 11
  • 2
  • 2
    First, no, it does not work - maybe you can try it before posting? Then [this answer](https://stackoverflow.com/questions/5993779/use-string-split-with-multiple-delimiters#answer-13928086) is same as your - but working. Finally you should check your formating (_should work._). – Arount Oct 11 '17 at 23:19
  • 2
    Please help fighting the misunderstanding that StackOverflow is a free code-writing service, by augmenting your code-only answer with some explanation. – Yunnosch Jun 25 '19 at 17:49
0

you can try this way as split accepts varargs so we can pass multiple parameters as delimeters

 String[]tokens = pdfName.split("-",".");

you can pass as many parameters that you want.

Rajesh Koshti
  • 572
  • 1
  • 7
  • 25
-1

If you know the sting will always be in the same format, first split the string based on . and store the string at the first index in a variable. Then split the string in the second index based on - and store indexes 0, 1 and 2. Finally, split index 2 of the previous array based on . and you should have obtained all of the relevant fields.

Refer to the following snippet:

String[] tmp = pdfName.split(".");
String val1 = tmp[0];
tmp = tmp[1].split("-");
String val2 = tmp[0];
...
UrsinusTheStrong
  • 1,239
  • 1
  • 16
  • 33
isometrik
  • 409
  • 2
  • 9
  • 19