2

I have a file with the names written in each line like this: subhash chand (line feed) yobie chimwanachomama (line feed) riadh chaieb (line feed)

Now if I run regexp search for [a-z][a-z] it returns "su bh as ch an yo...". Is there a regexp pattern that would return matches of this form ""su ub ha as sh ch ha an nd..."? This regexp works like tokenizer of length '2'. It would be great if regexp is a valid Java regexp.

glennsl
  • 28,186
  • 12
  • 57
  • 75
Ashish Jain
  • 447
  • 1
  • 6
  • 20

1 Answers1

1

Try this regex:

(?=([a-zA-Z]{2}))

This will look ahead in the string and match an empty string if the string after it matches [a-zA-Z]{2} and then it puts the 2 characters after it into a group. Since the engine will check every index, this will return you your expected result.

You just need to get all the group1s of the matches

final String regex = "(?=([a-zA-Z]{2}))";
final String string = "subhash chand\n"
        + "yobie chimwanachomama\n"
        + "riadh chaieb";

final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(matcher.group(1));

}

Try it here!

Sweeper
  • 213,210
  • 22
  • 193
  • 313