0

I'm now working on a (f)lex scanner, and when I want handle the numbers using regex as follows:

(\B[+-]?[0-9]+)|([0-9]+)

and the match requirements(the result I want) are as the picture shown:

the result I want

we can see that the greens and reds are match cases(for the first and second condition).This "experiment" is done at https://regex101.com/.

However when I tried to apply the above regex in lex, it does not work.

I have done some research on the reasons and alternative but they doesn't fit my needs:

  1. The answer of this explanined why the regex above not working
  2. This answer provide an alternative for similar cases, which may give you some ideas.

I hope someone could give me some ideas. Thank you!

  • 1
    Why do you feel like you need the word boundary check? If you are trying to attach signs to integer literals in some cases but not in others, you'll find yourself circling a sinkhole of special cases. It's almost always simpler to attach signs to numbers syntactically rather than lexically; just return the `-` as a token and let the parser figure it out. (F)lex doesn't have boundary checks. But if you can explain your needs, we might be able to find you a solution. – rici Oct 19 '20 at 04:08
  • 1
    By the way, the pattern you're trying to use would recognise the `-` in `(-a)` as a number. You probably meant `[0-9]+` instead of `[0-9]*`. – rici Oct 19 '20 at 04:11
  • @rici Actually I want to use the word boundary check simply because it worked at regex101.com for my case (and it looks elegant, I admit it XD) . And yes, letting parser figure it out would be much easier, but I was asked to do it using (f)lex. – crechenko0609 Oct 19 '20 at 07:22
  • 1
    What were you "asked to do", exactly? If this is an assignment, it seems to me unlikely that they would ask for something which requires a feature flex doesn't provide. – rici Oct 19 '20 at 07:30
  • 1
    Regex101 is rarely if ever useful for building lexers, by the way. – rici Oct 19 '20 at 07:32
  • @rici Well, this is an assignment. But the requirements above(number detection) is not asked to be achieved by using the word boundary feature, it was just my attempt, though. If there is any other way to meet the number detection by using (f)lex and regex, I would be glad as well. – crechenko0609 Oct 19 '20 at 08:45
  • You need to tell us what your requirements are *exactly*. For example, you seem to be trying to match `-10` as a single token. Is that a requirement you were given or just how you decided to approach it? As rici already pointed out, just always matching `-` and `+` as their own tokens is much easier (and the way it's almost always done in practice). – sepp2k Oct 19 '20 at 20:37

0 Answers0