1

I try to learn python3 and i am stack little bit with the regular expressions. I study the HOWTO for this but i did not understand very well.this page

    1\d2\D2
    ^a\w+z$
indiag
  • 233
  • 1
  • 4
  • 10
  • It's unclear what you're asking. Do you want example strings that match these regular expressions? – robert May 26 '12 at 20:11
  • possible duplicate of [Reversing a regular expression in python](http://stackoverflow.com/questions/492716/reversing-a-regular-expression-in-python) – agent-j May 26 '12 at 20:19

2 Answers2

9

you can generate example strings by reading the expression and choosing appropriate characters step by step.

for example, 1\d2\D2:

1\d2\D2 -> 1
^ 1 means a literal number 1

1\d2\D2 -> 17
 ^^ \d means any digit (0-9).  let's choose 7.

1\d2\D2 -> 172
   ^ 2 means a literal number 2.

1\d2\D2 -> 172X
    ^^ \D means anything *but* a digit (0-9).  let's choose X

1\d2\D2 -> 172X2
      ^ 2 means a literal number 2.

so 172X2 would be matched by 1\d2\D2

your next one - ^a\w+z$ - can have multiple lengths:

^a\w+z$
^ this means we need to be at the start of a line (and we are, so that's cool)

^a\w+z$ -> a
 ^ a means a literal letter a

^a\w+z$ -> a4
  ^^ \w means a digit, letter, or "_".  let's choose 4.

^a\w+z$ -> a4
    ^ + means we can return to whatever is to the left, if we want, so let's do that...

^a\w+z$ -> a4Q
  ^^ \w means a digit, letter, or "_".  let's choose Q.

^a\w+z$ -> a4Q
    ^ + means we can return to whatever is to the left, if we want, so let's do that...

^a\w+z$ -> a4Q1
  ^^ \w means a digit, letter, or "_".  let's choose 1.

^a\w+z$ -> a4Q1
    ^ + means we can return to whatever is to the left, but now let's stop

^a\w+z$ -> a4Q1z
     ^ z means a literal letter z

^a\w+z$ -> a4Q1z
      ^ $ means we must be at the end of the line, and we are (and so cannot add more)

so a4Q1z would be matched by ^a\w+z$. so would a4z (you can check...)

note that * is like + in that you can jump back and repeat but also it means that you can completely skip what is to the left (in other words, + means "repeat at least once", but * means "repeat zero or more" (the "zero" being the skip)).

update:

[abc] means pick any one of a, b or c.

x{2,3} means add x 2 to 3 times (like + but with limits to the number of times). so, xx or xxx.

\1 is a bit more complicated. you need to find what would have been inside the first (because the number 1) set of parentheses and add that. so, for example, (\d+)\1 would match 2323 if you had worked from left to right and chosen 23 for (\d+).

andrew cooke
  • 45,717
  • 10
  • 93
  • 143
0

To generate some samples that would be matched, one would probably parse the regex and send each chunk of the regex to a function you would write like getRandomSatisfyingText. Call it a bunch of times until you get 3 unique strings. It probably wouldn't be too hard until you started supporting atomic assertions (lookaheads and behinds).

agent-j
  • 27,335
  • 5
  • 52
  • 79