0

I'm working on lexical analyzing in java world, and want to break a given string into tokens discarding the spaces. I use the below regex to match tokens such as alphabet, numbers and the most common operators and separators:

"[a-zA-Z0-9_]+|[\\[\\](){}.;,!<>+^%]"

However, operators like ++ , -- , ==,<=,>= ^=,*=,+= is difficult to handle. Any help in how to improve my regex to fit my needs ? Many thanks.

FSm
  • 2,017
  • 7
  • 29
  • 55
  • 1
    You seem to be matching only one character at a time (even for identifiers).. try using `{1,2}` after the operators, and `+` after the letters.. – xs0 Dec 01 '17 at 14:02
  • You had one problem, decided to use regex, now got two. – revo Dec 01 '17 at 14:02
  • @revo, I agree with you. However, in my case, the regex is mandatory. – FSm Dec 01 '17 at 14:52
  • 1
    Java is not a regular language. So you are going to have a hard time dealing with things like `String foo="++What-is-this?++";` using regexes - mandatory reference: https://stackoverflow.com/a/1732454/1466267 – SpaceTrucker Dec 01 '17 at 15:17
  • Space, thanks for giving your attention. I meant in `in my case, the regex is mandatory` that I have to use regex for studying matter. My project not going to deal with complex java code – FSm Dec 01 '17 at 16:44

1 Answers1

0

Try this one:

"[a-zA-Z0-9_]|\+\+|--|<<|>>|[=+<>^*]=|[\[\](){}.;,!<>+^%]"

Explanation:

  • \+\+ catches the ++
  • -- catches the --
  • << catches the <<
  • >> catches the >>
  • [=+<>^*]= catches ==,<=,>=,^=,*=,+=

Online test

Oneiros
  • 4,328
  • 6
  • 40
  • 69
  • Thanks Oneiros, How about operators like `>>` , `<<` ? – FSm Dec 01 '17 at 14:35
  • Elegant, but I'm really wondering why `\+\+` not just `++` ? – FSm Dec 01 '17 at 14:54
  • 1
    Because `+` is a special character in regex syntax, it means "at least one". Open the test link and try to remove the backslashes, see what happens (the best way to learn regex is to play around on these awesome online test websites) – Oneiros Dec 01 '17 at 14:56