I m trying to get a regex which will work on multi-line C comments. Managed to make it work for /* comments here */ but does not work if the comment goes to the next line. How do I make a regex which spans over multiple lines?
Using this as my input:
/* this comment
must be recognized */
The problem I get is "must, be and recognized" is matched as ID's and */ as illegal characters.
#!/usr/bin/python
import ply.lex as lex
tokens = ['ID', 'COMMENT']
t_ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
def t_COMMENT(t):
r'(?s)/\*(.*?).?(\*/)'
#r'(?s)/\*(.*?).?(\*/)' does not work either.
return t
# Error handling rule
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
lex.lex() #Build the lexer
lex.input('/* this comment\r\n must be recognised */\r\n')
while True:
tok = lex.token()
if not tok:break
if tok.type == 'COMMENT':
print tok.type
I tried quite a few: Create array of regex match(multiline) and How to handle multiple rules for one token with PLY and few other things available at http://www.dabeaz.com/ply/ply.html