-4

I have this patter p and want to use this pattern to find if it has any matching. This is in Python.

p = "keyword" + ".*?(\d+(\.\d+)?[\s%]?[\w/]*)" found = re.findall(p, some_text)

I have problem parsing this regex.

  1. What is the first "?".

    I understand that ".*" matches any thing for 0 or more times. But not sure what the "?" does here.

  2. It is weird to see nested capture group parenthesis. What does it do?

  3. What is the "?" in [\s%]? regex? I assume this is matching white space followed by "%". But not sure what the "?" does here.

  4. What is the asterisk in [\w/]* regex? I assume this is matching any word character followed by forward slash. But not sure what the "*" does.

leopoodle
  • 2,110
  • 7
  • 24
  • 36
  • 1
    Possible duplicate of [Reference - What does this regex mean?](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – Mohammad Yusuf Jan 19 '17 at 06:25

1 Answers1

2
.*?(\d+(\.\d+)?[\s%]?[\w/]*)
  1. .*? matches any character (except for line terminators) *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
  2. 1st Capturing Group (\d+(\.\d+)?[\s%]?[\w/]*)
  3. \d+ matches a digit (equal to [0-9]) Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
  4. 2nd Capturing Group (\.\d+)?
  5. \. matches the character . literally (case sensitive)
  6. [\s%]? Match a single character present in \r or \n or \t or \f or \v
  7. % matches the character % literally (case sensitive)
  8. [\w/]* , where \w Match a single character present in a-zA-Z0-9_
  9. / matches the character / literally (case sensitive)

You can put your regex in here and get the analysis at the right top of the site.

Mustofa Rizwan
  • 10,215
  • 2
  • 28
  • 43
  • Thanks a lot. I have a few more questions. 10. `[\s%]?` ---- What does the "?" do here? 11. `[\w/\]*` ---- What does the "*" do here? 12. This regex is working mostly for my purposes. But I need to also be able to extract number from a text like "KeyA: 5555\n KeyB:". I want to extract number 5555 in this case, but this regex extracts "5555\n KeyB". Do you know how to make a modification? Note that I still need to match things like % and forward slash (i.e. i cannot remove them from the regex) – leopoodle Jan 19 '17 at 16:30