0

I'm converting very long JS Regex into Android Java library.

The JS code below gives undoubtedly 29 items in an array starting from ["","常","","に","","最新","、","最高"...]

var keywords = /(\ |[a-zA-Z0-9]+\.[a-z]{2,}|[一-龠々〆ヵヶゝ]+|[ぁ-んゝ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+)/g;
var source = '常に最新、最高のモバイル。Androidを開発した同じチームから。';
var result = source.split(keywords);

And I converted into Java(JDK 1.8.0_151) as below;

public class Japanese {
    static String keywords = "(\\ |[a-zA-Z0-9]+\\.[a-z]{2,}|[一-龠々〆ヵヶゝ]+|[ぁ-んゝ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[a-zA-Z0-9]+)";

    public static String[] Split(String str) {
        return str.split(keywords, -1);
    }
}


@Test
public void test1() {
    String source= "常に最新、最高のモバイル。Androidを開発した同じチームから。";
    String[] result = Japanese.Split(source);

    System.out.println("length: " + String.valueOf(result.length));
    for (String w : result) {
        System.out.println(!w.equals("") ? w : "EMPTY");
    }
}

The test result has only 15 items and contains wrong items as below;

length: 15
EMPTY
EMPTY
EMPTY
、
EMPTY
EMPTY
。
EMPTY
EMPTY
EMPTY
EMPTY
EMPTY
EMPTY
EMPTY
。

What am I doing wrong?

Youngjae
  • 24,352
  • 18
  • 113
  • 198
  • 1
    JS regex `split` method outputs captured substrings, Java's one does not. – Wiktor Stribiżew Sep 04 '18 at 10:36
  • I hope you found an answer [in the linked thread](https://stackoverflow.com/questions/2206378/how-to-split-a-string-but-also-keep-the-delimiters) that provides the solution. There are at least two there. – Wiktor Stribiżew Sep 04 '18 at 11:40
  • @WiktorStribiżew thanks for the comment. I read, but I couldn't get it as my Regex is not that simple. I'm still reading and trying. – Youngjae Sep 04 '18 at 11:51
  • 1
    Have a look at [this answer](https://stackoverflow.com/a/279549/3832970). It does almost all you need, you only need to adjust the method where the substrings between the previous and current match are added to the resulting array. – Wiktor Stribiżew Sep 04 '18 at 11:53
  • @WiktorStribiżew Thanks for pointing it out for me. It works. I should study this. – Youngjae Sep 04 '18 at 11:58

0 Answers0