2

I am trying to do a 4-line multi-line match. My code finds the first one. But not the others.

Here is the pattern:

pattern = re.compile("([a-z]+\.com\.|net\.)[.\s\S]+(Z[A-Z0-9]+)")

Here is the subject:

sub = """yahoo.com.
Public
8
Z2RVE9XGX4PFJN
google.com.
Public
7
Z2VATLWTLBDR5D
""" 

Here is the complete code:

import re
pattern = re.compile("([a-z]+\.com\.|net\.)[.\s\S]+(Z[A-Z0-9]+)")

sub = """yahoo.com.
Public
8
Z2RVE9JJGX4PFJN
google.com.
Public
7
Z2VATZOPLBDR5D
"""

m = pattern.findall(sub)

print(m)

Here is the result:

[('yahoo.com.', 'Z2RVE9JJGX4PFJN')]

And finally, here is the desired result:

[('yahoo.com.', 'Z2RVE9JJGX4PFJN'), ('google.com', Z2VATZOPLBDR5D')]

Thank you.

1 Answers1

0

You are close. Just make your match less greedy:

import re
pattern = re.compile("([a-z]+\.com\.|net\.)[\s\S]+?(Z[A-Z0-9]+)")
# Note the 'less greedy' addition                 ^
# The '.' is not necessary in the           ^ in the character class
sub = """yahoo.com.
Public
8
Z2RVE9JJGX4PFJN
google.com.
Public
7
Z2VATZOPLBDR5D
"""

m = pattern.findall(sub)

print(m)

Prints:

[('yahoo.com.', 'Z2RVE9JJGX4PFJN'), ('google.com.', 'Z2VATZOPLBDR5D')]

For greater specificity on the ends of your patterns, you may want to use anchors:

pattern = re.compile("^([a-z]+\.com\.|net\.)$[\s\S]+?^(Z[A-Z0-9]+)$", re.M)
# Start of line       ^                              ^ 
# End of line                               ^                     ^
# Multi line flag                                                       ^
dawg
  • 98,345
  • 23
  • 131
  • 206
  • Thank you! Works perfectly. I'd mark it correct if I knew how. – dev.user.23 May 29 '17 at 18:28
  • I went back and studied a bit. As a side note, I'd like to point out where I think I went wrong for others who have mediocre regex skills. Here "[.\s\S]". I believe that "." was an import part of the problem, in that it matched everything in a greedy fashion until the last (Z[A-Z0-9]+). Thanks again, Dawg. – dev.user.23 May 29 '17 at 19:03
  • The portion `[\s\S]` matches any character including `\n`. It will run right over the next match. Rarely used without `?` unless you want to run all the way to the end of something. – dawg May 29 '17 at 22:39
  • Aww I see. Thanks so much for the education. – dev.user.23 May 30 '17 at 00:30