0

I want to understand how the group function in java regex works.

When I use the regex

([\u25CB\u25CF])\s+([\u25CB\u25CF])\s+([\u25CB\u25CF])\s+([\u25CB\u25CF])\s+

on the Text

Prozesse & Methoden
Technische Dokumentation    ○   ○   ●   ○
OSI Model   ○   ○   ●   ○

I would expect to have first match look like this: groups 0 to 3 like "○", "○", "●", "○". Four groups with one circle in it.

But in fact it looks like this: "○ ○ ● ○", "○", "○", "●". The groups () only span one character each, how can the first group encompass the whole expression?

When I add an empty group behind my expression it matches like this:

([\u25CB\u25CF])\s+([\u25CB\u25CF])\s+([\u25CB\u25CF])\s+([\u25CB\u25CF])\s+()

"○ ○ ● ○", "○", "○", "●", "○"

The last non empty group is remembered then. I can not understand why.

Tested both with java 1.8 and on website http://www.freeformatter.com/regex-tester.html

  • Group zero encompasses the entire pattern. So, for the four bubbles you'll have to use group(1) to group(4). – laune Mar 26 '17 at 19:34
  • Your [regex](https://regex101.com/r/iklBaw/) is working as expected, each group matches 1 "circle". However the group 0 returns the whole matched string. – Alexander Farber Mar 29 '17 at 07:40

0 Answers0