I'm converting very long JS Regex into Android Java library.
The JS code below gives undoubtedly 29 items in an array starting from ["","常","","に","","最新","、","最高"...]
var keywords = /(\ |[a-zA-Z0-9]+\.[a-z]{2,}|[一-龠々〆ヵヶゝ]+|[ぁ-んゝ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+)/g;
var source = '常に最新、最高のモバイル。Androidを開発した同じチームから。';
var result = source.split(keywords);
And I converted into Java(JDK 1.8.0_151) as below;
public class Japanese {
static String keywords = "(\\ |[a-zA-Z0-9]+\\.[a-z]{2,}|[一-龠々〆ヵヶゝ]+|[ぁ-んゝ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+)";
public static String[] Split(String str) {
return str.split(keywords, -1);
}
}
@Test
public void test1() {
String source= "常に最新、最高のモバイル。Androidを開発した同じチームから。";
String[] result = Japanese.Split(source);
System.out.println("length: " + String.valueOf(result.length));
for (String w : result) {
System.out.println(!w.equals("") ? w : "EMPTY");
}
}
The test result has only 15
items and contains wrong items as below;
length: 15
EMPTY
EMPTY
EMPTY
、
EMPTY
EMPTY
。
EMPTY
EMPTY
EMPTY
EMPTY
EMPTY
EMPTY
EMPTY
。
What am I doing wrong?