I try to learn python3 and i am stack little bit with the regular expressions. I study the HOWTO for this but i did not understand very well.this page
1\d2\D2
^a\w+z$
I try to learn python3 and i am stack little bit with the regular expressions. I study the HOWTO for this but i did not understand very well.this page
1\d2\D2
^a\w+z$
you can generate example strings by reading the expression and choosing appropriate characters step by step.
for example, 1\d2\D2:
1\d2\D2 -> 1 ^ 1 means a literal number 1 1\d2\D2 -> 17 ^^ \d means any digit (0-9). let's choose 7. 1\d2\D2 -> 172 ^ 2 means a literal number 2. 1\d2\D2 -> 172X ^^ \D means anything *but* a digit (0-9). let's choose X 1\d2\D2 -> 172X2 ^ 2 means a literal number 2.
so 172X2
would be matched by 1\d2\D2
your next one - ^a\w+z$
- can have multiple lengths:
^a\w+z$ ^ this means we need to be at the start of a line (and we are, so that's cool) ^a\w+z$ -> a ^ a means a literal letter a ^a\w+z$ -> a4 ^^ \w means a digit, letter, or "_". let's choose 4. ^a\w+z$ -> a4 ^ + means we can return to whatever is to the left, if we want, so let's do that... ^a\w+z$ -> a4Q ^^ \w means a digit, letter, or "_". let's choose Q. ^a\w+z$ -> a4Q ^ + means we can return to whatever is to the left, if we want, so let's do that... ^a\w+z$ -> a4Q1 ^^ \w means a digit, letter, or "_". let's choose 1. ^a\w+z$ -> a4Q1 ^ + means we can return to whatever is to the left, but now let's stop ^a\w+z$ -> a4Q1z ^ z means a literal letter z ^a\w+z$ -> a4Q1z ^ $ means we must be at the end of the line, and we are (and so cannot add more)
so a4Q1z
would be matched by ^a\w+z$
. so would a4z
(you can check...)
note that *
is like +
in that you can jump back and repeat but also it means that you can completely skip what is to the left (in other words, +
means "repeat at least once", but *
means "repeat zero or more" (the "zero" being the skip)).
update:
[abc]
means pick any one of a
, b
or c
.
x{2,3}
means add x
2 to 3 times (like +
but with limits to the number of times). so, xx
or xxx
.
\1
is a bit more complicated. you need to find what would have been inside the first (because the number 1) set of parentheses and add that. so, for example, (\d+)\1
would match 2323
if you had worked from left to right and chosen 23
for (\d+)
.
To generate some samples that would be matched, one would probably parse the regex and send each chunk of the regex to a function you would write like getRandomSatisfyingText
. Call it a bunch of times until you get 3 unique strings. It probably wouldn't be too hard until you started supporting atomic assertions (lookaheads and behinds).