0

I am trying to build a parser for a sonar-plugin where the tokens can contain spaces and tabs in order to use them for implementing a checking rule for spaces. Therefore, I want to store them to different tokens.

I set the space and tab as TokenType:

    .withChannel(regexp(TokenType.TAB, "\t"))
    .withChannel(regexp(TokenType.WHITESPACE, "\\s"))

But, tabs are regarded as spaces tokens as well, because in Java the regexp for /s matches any white space character (space, tab, line break, carriage return)

What's the right regexp to discriminate tabs from spaces?

The Guy with The Hat
  • 10,836
  • 8
  • 57
  • 75
FILIaS
  • 495
  • 4
  • 13
  • 26

2 Answers2

3

I'm not familiar with the TokenType syntax but to get all the whitespace without tabs you could use:

[ \n\x0b\r\f]

Because \s is just a short form of writing [ \t\n\x0B\f\r]. Refer to the documentation.

MoRe
  • 1,478
  • 13
  • 25
  • It's not a valid syntax for TokenType but regardless that in my case I don't even need newlines or end of lines just whitespaces, but the "[ ]" doesn't do my job – FILIaS Jan 15 '14 at 14:17
  • 1
    So what you call whitespaces means _spaces_? – MoRe Jan 15 '14 at 14:25
2

With:

.withChannel(new BlackHoleChannel("\n"))        //removes newlines from source code
.withChannel(regexp(TclTokenType.TAB, "\t"))    //matches tabs
.withChannel(regexp(TokenType.WHITESPACE," "))  //matches spaces

Spaces are matched correctly, and tabs are recognized. The key is on the BlackHoleChannel.

This is FILIaS's solution from revision 15 of the question.

Community
  • 1
  • 1
The Guy with The Hat
  • 10,836
  • 8
  • 57
  • 75