JFlex certainly has a lookahead facility, the same as (f)lex. Unlike Java regex lookahead assertions, the JFlex lookahead can only be applied at the end of a match, but it is otherwise similar. It is described in the Semantics section of JFlex manual:
In a lexical rule, a regular expression r
may be followed by a look-ahead expression. A look-ahead expression is either $
(the end of line operator) or /
followed by an arbitrary regular expression. In both cases the look-ahead is not consumed and not included in the matched text region, but it is considered while determining which rule has the longest match…
So you could certainly write the rule:
[:letter:]+\-[:letter:]/\s
However, you cannot put such a rule in a macro definition (REGEX = …
), as the manual also mentions (in the section on macros):
The regular expression on the right hand side must be well formed and must not contain the ^
, /
or $
operators.
So the lookahead operator can only be used in a pattern rule.
Note that \s
matches any whitespace character, including newline characters, while .
does not match any newline character. I think that's what lead to your comment that REGEX = [:letter:]+\-[:letter:]\.
"does not work if matched string does not have anything succeeding" (I'm guessing that you meant "does not have anything succeeding it on the same line, and also that you intended to write .
rather than \.
).
Rather than testing for following whitespace, you might (depending on your language) prefer to test for a non-word character:
[:letter:]+\-[:letter:]/\W
or to craft a more precise specification as a set of Unicode properties, as in the definition of \W
(also found in the linked section of the JFlex manual).
Having said all that, I'd like to repeat the advice from my previous answer to a similar question of yours: put more specific patterns first. For example, using the following pair of patterns will guarantee that the first one picks up words with a single letter suffix, while avoiding the need to explicitly pushback.
[:letter:]+(-[:letter:])? { /* matches 'interferon' or 'interferon-a' */ }
[:letter:]+/-[:letter:]+ { /* matches only 'interferon' from 'interferon-alpha' */ }
Of course, in this case you could easily avoid the collision between the second pattern and the first pattern by using {2,}
instead of +
for the second repetition, but it's perfectly OK to rely on pattern ordering since it's often inconvenient to guarantee that patterns don't overlap.