1

I am trying to split a sentence with 32 chars in each group of regex. The sentence is split after the complete word if 32nd character is a letter in the word. When my input is a sentence which has "-" it splits that word too.

This is the regex I am using

(\b.{1,32}\b\W?)

Input string:

Half Bone-in Spiral int with dark Packd Smithfield Half Bone-in Spiral Ham with Glaze Pack

resulting groups:

  1. Half Bone-in Spiral int with
  2. dark Packd Smithfield Half Bone-
  3. in Spiral Ham with Glaze Pack

In above split "Bone-in" is one word but regex splits it considering separate words. How can I modify my regex to treat "-" as one word? In short, I want the split after Bone-in.

Thank You.

JohnyL
  • 6,894
  • 3
  • 22
  • 41
masterfly
  • 861
  • 4
  • 11
  • 24

1 Answers1

1

You may use

(\b.{1,32}(?![\w-])\W?)

Details

  • \b - a word boundary
  • .{1,32} - 1 to 32 chars other than line break chars, as many as possible
  • (?![\w-]) - the char immediately to the left of the current location cannot be a word (letter, digit or _) or - char
  • \W? - an optional non-word char.

In Java, use the following method:

public static String[] splitIncludeDelimeter(String regex, String text){
    List<String> list = new LinkedList<>();
    Matcher matcher = Pattern.compile(regex).matcher(text);

    int now, old = 0;
    while(matcher.find()){
        now = matcher.end();
        list.add(text.substring(old, now));
        old = now;
    }

    if(list.size() == 0)
        return new String[]{text};

    //adding rest of a text as last element
    String finalElement = text.substring(old);
    list.add(finalElement);

    return list.toArray(new String[list.size()]);
}

Java example:

String s = "Half Bone-in Spiral int with dark Packd Smithfield Half Bone-in Spiral Ham with Glaze Pack";
String[] res = splitIncludeDelimeter("(\\b.{1,32}(?![\\w-])\\W?)", s);
System.out.println(Arrays.toString(res));
// => [Half Bone-in Spiral int with , dark Packd Smithfield Half , Bone-in Spiral Ham with Glaze , Pack, ]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563