148

I need to split a String into an array of single character Strings.

Eg, splitting "cat" would give the array "c", "a", "t"

lospejos
  • 1,976
  • 3
  • 19
  • 35
Matt
  • 11,157
  • 26
  • 81
  • 110

12 Answers12

142
"cat".split("(?!^)")

This will produce

array ["c", "a", "t"]

Stephan
  • 41,764
  • 65
  • 238
  • 329
coberty
  • 1,499
  • 1
  • 10
  • 8
  • 11
    How and why? Is this a regex meaning any character? Because in my mind, with the way split works, this should split on only the actual characters (, ?, !, ^, and ). However, it works as you say it does. – Ty_ Mar 06 '14 at 02:07
  • 5
    This is indeed a regex-expression, called a negative lookahead. Checkout the documentation here: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#sum – Erwin May 28 '14 at 08:51
  • 4
    @EW-CodeMonkey `(?!`...`)` is regex syntax for a negative assertion – it asserts that there is no match of what is inside it. And `^` matches the beginning of the string, so the regex matches at every position that is *not* the beginning of the string, and inserts a split there. This regex also matches at the end of the string and so would also append an empty string to the result, except that the `String.split` documentation [says](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#split-java.lang.String-) "trailing empty strings are not included in the resulting array". – Boann Nov 09 '15 at 00:46
  • 12
    In Java 8 the behavior of `String.split` was slightly changed so that *leading* empty strings produced by a zero-width match also are not included in the result array, so the `(?!^)` assertion that the position is not the beginning of the string becomes unnecessary, allowing the regex to be simplified to nothing – `"cat".split("")` – but in Java 7 and below that produces a leading empty string in the result array. – Boann Nov 09 '15 at 00:52
  • 3
    It creates an array of an entire string. – Eduard Sep 25 '17 at 10:17
133
"cat".toCharArray()

But if you need strings

"cat".split("")

Edit: which will return an empty first value.

Yuriy Faktorovich
  • 67,283
  • 14
  • 105
  • 142
  • 14
    "cat".split("") would return [, c, a, t], no? You will have a extra character in your Array... – reef Mar 08 '11 at 16:48
  • 5
    The "cat".split("") does not work as expected by Matt, you will get an extra empty String => [, c, a, t]. – reef Mar 08 '11 at 16:57
  • 6
    This answer does now work if you're using Java 8. See http://stackoverflow.com/a/22718904/1587046 – Alexis C. Apr 25 '14 at 14:01
  • 7
    This was a horrific change in jdk8 because i relied on split("") and did workarounds cause of this silly empty first index. Now after upgrading to java8, it works as i would have expected it years ago. unfortunately now my workaround breaks my code... ggrrrr. – Logemann Oct 16 '15 at 00:52
  • @Marc You should probably be using `.toCharArray()` anyway; it avoids regex and returns an array of `char` primitives so it's faster and lighter. It's odd to need an array of 1-character *strings*. – Boann Nov 09 '15 at 15:31
  • String[] newCAT = Arrays.copyOfRange(cat, 1, cat.length); – atiruz Jun 06 '18 at 20:52
  • In android api 30 there isn't extra character and my code worked only on android api<30 please delete this answer lol! – Evilripper Mar 21 '21 at 16:46
  • `"cat".toCharArray()` is the correct solution – garryp Oct 05 '21 at 12:34
48
String str = "cat";
char[] cArray = str.toCharArray();
jmj
  • 237,923
  • 42
  • 401
  • 438
Raman
  • 1,507
  • 3
  • 15
  • 25
  • 4
    Nitpicking, the original question asks for an array of String, not an array of Char. However it's quite easy to get an array of String from here. – dsolimano Mar 08 '11 at 16:41
  • Yeah, I already know how to get an array of chars. I can just iterate through the char array and create a string from each one though, if there's no other way. – Matt Mar 08 '11 at 23:11
  • How would you convert `cArray` back to `String`? – Bitmap Jun 27 '11 at 08:19
  • ^12 years late but it's just `new String(cArray);` – Siddhartha May 11 '23 at 04:13
12

If characters beyond Basic Multilingual Plane are expected on input (some CJK characters, new emoji...), approaches such as "ab".split("(?!^)") cannot be used, because they break such characters (results into array ["a", "?", "?", "b"]) and something safer has to be used:

"ab".codePoints()
    .mapToObj(cp -> new String(Character.toChars(cp)))
    .toArray(size -> new String[size]);
Jan Molnár
  • 410
  • 7
  • 14
6

split("(?!^)") does not work correctly if the string contains surrogate pairs. You should use split("(?<=.)").

String[] splitted = "花ab".split("(?<=.)");
System.out.println(Arrays.toString(splitted));

output:

[花, a, b, , , ]
5

To sum up the other answers...

This works on all Java versions:

"cat".split("(?!^)")

This only works on Java 8 and up:

"cat".split("")
Lezorte
  • 473
  • 1
  • 5
  • 13
3

An efficient way of turning a String into an array of one-character Strings would be to do this:

String[] res = new String[str.length()];
for (int i = 0; i < str.length(); i++) {
    res[i] = Character.toString(str.charAt(i));
}

However, this does not take account of the fact that a char in a String could actually represent half of a Unicode code-point. (If the code-point is not in the BMP.) To deal with that you need to iterate through the code points ... which is more complicated.

This approach will be faster than using String.split(/* clever regex*/), and it will probably be faster than using Java 8+ streams. It is probable faster than this:

String[] res = new String[str.length()];
int 0 = 0;
for (char ch: str.toCharArray[]) {
    res[i++] = Character.toString(ch);
}  

because toCharArray has to copy the characters to a new array.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
2
for(int i=0;i<str.length();i++)
{
System.out.println(str.charAt(i));
}
JV More
  • 45
  • 1
  • 1
    Are you sure that this is going to split a string into an array? You're just printing the string to the screen. – TDG Jun 07 '16 at 17:21
1

If the original string contains supplementary Unicode characters, then split() would not work, as it splits these characters into surrogate pairs. To correctly handle these special characters, a code like this works:

String[] chars = new String[stringToSplit.codePointCount(0, stringToSplit.length())];
for (int i = 0, j = 0; i < stringToSplit.length(); j++) {
    int cp = stringToSplit.codePointAt(i);
    char c[] = Character.toChars(cp);
    chars[j] = new String(c);
    i += Character.charCount(cp);
}
Daniel Nitzan
  • 1,582
  • 3
  • 19
  • 36
1

Maybe you can use a for loop that goes through the String content and extract characters by characters using the charAt method.

Combined with an ArrayList<String> for example you can get your array of individual characters.

reef
  • 1,813
  • 2
  • 23
  • 36
  • Maybe you could stand on one leg and sing "God Save the Queen". Sorry, but this isn't even close to correct. – Stephen C Jan 27 '17 at 07:54
1

In my previous answer I mixed up with JavaScript. Here goes an analysis of performance in Java.

I agree with the need for attention on the Unicode Surrogate Pairs in Java String. This breaks the meaning of methods like String.length() or even the functional meaning of Character because it's ultimately a technical object which may not represent one character in human language.

I implemented 4 methods that split a string into list of character-representing strings (Strings corresponding to human meaning of characters). And here's the result of comparison:

A line is a String consisting of 1000 arbitrary chosen emojis and 1000 ASCII characters (1000 times <emoji><ascii>, total 2000 "characters" in human meaning).

Comparison of different splitting methods

(discarding 256 and 512 measures) enter image description here

Implementations:

  • codePoints (java 11 and above)
    public static List<String> toCharacterStringListWithCodePoints(String str) {
        if (str == null) {
            return Collections.emptyList();
        }
        return str.codePoints()
            .mapToObj(Character::toString)
            .collect(Collectors.toList());
    }
  • classic
    public static List<String> toCharacterStringListWithIfBlock(String str) {
        if (str == null) {
            return Collections.emptyList();
        }
        List<String> strings = new ArrayList<>();
        char[] charArray = str.toCharArray();
        int delta = 1;
        for (int i = 0; i < charArray.length; i += delta) {
            delta = 1;
            if (i < charArray.length - 1 && Character.isSurrogatePair(charArray[i], charArray[i + 1])) {
                delta = 2;
                strings.add(String.valueOf(new char[]{ charArray[i], charArray[i + 1] }));
            } else {
                strings.add(Character.toString(charArray[i]));
            }
        }
        return strings;
    }
  • regex
    static final Pattern p = Pattern.compile("(?<=.)");
    public static List<String> toCharacterStringListWithRegex(String str) {
        if (str == null) {
            return Collections.emptyList();
        }
        return Arrays.asList(p.split(str));
    }

Annex (RAW DATA):

codePoints;classic;regex;lines
45;44;84;256
14;20;98;512
29;42;91;1024
52;56;99;2048
87;121;174;4096
175;221;375;8192
345;411;839;16384
667;826;1285;32768
1277;1536;2440;65536
2426;2938;4238;131072
0

We can do this simply by

const string = 'hello';
console.log([...string]); // -> ['h','e','l','l','o']

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax says

Spread syntax (...) allows an iterable such as an array expression or string to be expanded...

So, strings can be quite simply spread into arrays of characters.