0

I want to do basic Chinese tokenization, simply breaking the string into individual characters. How can I do that in Java?

String str = "这是一个测试"

I want it to be;

["这",“是”,“一”,“个”,“测”,“试”]

marlon
  • 6,029
  • 8
  • 42
  • 76
  • 1
    `str.split("")` – shmosel Sep 19 '17 at 01:37
  • 1
    If you are using Java 1.5+ then just simply `str.toCharArray()` will do, notice: this will give you a `char` array, if you need a `String` array then use `str.split("")` like the above comment, but you get an extra empty string as the first array element, have a look here https://stackoverflow.com/questions/5235401/split-string-into-array-of-character-strings/5235439#5235439. `char` object since Java 1.5 can hold up to 16 bit of data. – Mr. Duc Nguyen Sep 19 '17 at 01:41
  • @shmosel .split does not work here – Anthony Audette Sep 19 '17 at 01:48
  • @AnthonyAudette Works fine for me. – shmosel Sep 19 '17 at 01:49
  • @shmosel my bad, I was misreading the question. I thought they wanted it to split into the individual components of each character. – Anthony Audette Sep 19 '17 at 02:11

0 Answers0