478

Why does the second line of this code throw ArrayIndexOutOfBoundsException?

String filename = "D:/some folder/001.docx";
String extensionRemoved = filename.split(".")[0];

While this works:

String driveLetter = filename.split("/")[0];

I use Java 7.

Sebastian Nielsen
  • 3,835
  • 5
  • 27
  • 43
Ali Ismayilov
  • 5,727
  • 2
  • 22
  • 24

4 Answers4

949

You need to escape the dot if you want to split on a literal dot:

String extensionRemoved = filename.split("\\.")[0];

Otherwise you are splitting on the regex ., which means "any character".
Note the double backslash needed to create a single backslash in the regex.


You're getting an ArrayIndexOutOfBoundsException because your input string is just a dot, ie ".", which is an edge case that produces an empty array when split on dot; split(regex) removes all trailing blanks from the result, but since splitting a dot on a dot leaves only two blanks, after trailing blanks are removed you're left with an empty array.

To avoid getting an ArrayIndexOutOfBoundsException for this edge case, use the overloaded version of split(regex, limit), which has a second parameter that is the size limit for the resulting array. When limit is negative, the behaviour of removing trailing blanks from the resulting array is disabled:

".".split("\\.", -1) // returns an array of two blanks, ie ["", ""]

ie, when filename is just a dot ".", calling filename.split("\\.", -1)[0] will return a blank, but calling filename.split("\\.")[0] will throw an ArrayIndexOutOfBoundsException.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 1
    Note that filename can contain multiple dots. One must use the last index of "." and use that to find the substring of the filename. – saurabheights Jun 12 '17 at 14:24
  • 2
    @saurabheights The question was not about a correct regex, but rather why there was a an `ArrayIndexOutOfBoundsException`. That said, you are incorrect: You don't need to know where the last dot is; you just need the right regex: `filename.split("\\.(?=[^.]*$)")`. This uses a *look ahead* to assert there are no dots anywhere in the input that follows the matching dot. – Bohemian Jun 12 '17 at 15:10
  • 1
    @emma you can delete them yourself via the “delete” link just beneath the question – Bohemian Aug 31 '19 at 01:17
  • 1
    A cleaner solution : str.split(Pattern.quote("."))[0] – A. Hafid Jan 05 '23 at 08:51
  • To split by a dot, you actually need to add four backslashes such as: – Tom Rutchik Jun 21 '23 at 21:55
  • @TomRutchik you actually **don't** need four backslashes: See [live demo](https://ideone.com/IvQNo1). – Bohemian Jun 21 '23 at 22:06
  • I've been testing this out interactively in an Eclipse IDE. Only 4 blackslashes worked! I would get an exception if I only used 2 blackslashes. – Tom Rutchik Jun 21 '23 at 22:09
  • Bohemian, you are right. It appears the Eclipse IDE has problems evaluating expressions that contain strings containing escape sequences. I didn't know that. I only use the Eclipse IDE to debug Java servlets. The eclipse expression evaluator will work for escaped stings if you add another level of escaping; which is why 4 backslashes worked for me but 2 wouldn't. Well that's life in the trenches! – Tom Rutchik Jun 21 '23 at 23:39
  • @TomRutchik Expression evaluators will display `"\."` (the correct regex) for a String that is coded using the *String literal* `"\\."` - to put a backslash (Hava's escape character) in a String when writing a String literal you escape the escape character. – Bohemian Jun 22 '23 at 02:35
138

The dot "." is a special character in java regex engine, so you have to use "\\." to escape this character:

final String extensionRemoved = filename.split("\\.")[0];
Nimantha
  • 6,405
  • 6
  • 28
  • 69
aimhaj
  • 1,615
  • 1
  • 11
  • 16
  • 29
    It is _not_ a special character in Java. It's a special character in Java's regex engine. – Nic Apr 08 '16 at 12:09
  • 1
    I just wasn't very accurate in my response but I agree with you. thanks for the precision ;) – aimhaj Apr 08 '16 at 13:20
  • 1
    It's a distinction worth making. Also, I just realized that I messed up a bit myself; it is a special char in Java, but that's not why it's causing a problem here. Anyway. – Nic Apr 08 '16 at 13:21
35

This is because . is a reserved character in regular expression, representing any character. Instead, we should use the following statement:

String extensionRemoved = filename.split("\\.")[0];
Gabriele Mariotti
  • 320,139
  • 94
  • 887
  • 841
21

I believe you should escape the dot. Try:

String filename = "D:/some folder/001.docx";
String extensionRemoved = filename.split("\\.")[0];

Otherwise dot is interpreted as any character in regular expressions.

Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176