If you have a doubt about your regular expression, you can also check it with ppcre:parse-string
:
CL-USER> (ppcre:parse-string "^/\w*$")
(:SEQUENCE :START-ANCHOR #\/ (:GREEDY-REPETITION 0 NIL #\w) :END-ANCHOR)
The above tells us that backslash-w
was interpreted as a literal w
character.
Compare this with the expression you wanted to use:
CL-USER> (ppcre:parse-string "^/\\w*$")
(:SEQUENCE :START-ANCHOR #\/ (:GREEDY-REPETITION 0 NIL :WORD-CHAR-CLASS) :END-ANCHOR)
The returned value is a tree that represents a regular expression. You can in fact use the same representation anywhere CL-PPCRE expects a regular expression. Even though it is somewhat verbose, this helps combining values into regexes, without having to worry about nesting strings or special characters inside strings:
(defun maybe (regex)
`(:greedy-repetition 0 1 ,regex))
(defparameter *simple-floats*
(let ((digits '(:register (:greedy-repetition 1 nil :digit-class))))
(ppcre:create-scanner `(:sequence
(:register (:regex "[+-]?"))
,digits
,(maybe `(:sequence "." ,digits))))))
Here above, the dot "."
is read literally, not as a regular expression. That means you can match strings like "(^.^)"
or "[]"
that could be hard to write and read with escaped characters in string-only regexes. You can fall back to regular expressions as strings by using the (:regex "...")
expression.
CL-PPCRE has an optimization where constant regular expressions are precomputed, at load time, using load-time-value
. That optimization might not be applied if your regular expressions are not trivially constants, so you may want to wrap your own scanners in load-time-value
forms. Just ensure that you have the sufficient definitions ready at load-time, like the auxiliary maybe
function.