Python look for pattern in string

Question

I am having trouble understanding the regular expressions module in Python. I think what I am trying to do is fairly simple, but I cannot figure it out.

I need to search through some xml files and find this pattern:

'DisplayName="Parcels (10-1-2012)"'

I can parse through the xml and make replacements no problem, the part I cannot figure out is how to do a wild card search to find any instance of "Parcels (some-date-year)". Since the date will vary, I need to find this pattern:

pat = '"Parcels (*-*-*)"'

and I want to replace it with today's date which I can do with the time module. I copied out a line of one of the 80 or so xml docs where I would need to find the pattern.

According to the help for the re.search() function, it seems I can just put in a pattern, then the string I wish to search through. However, I am getting errors.

Help on function search in module re:

search(pattern, string, flags=0) Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.

Here is my little test snippet:

import re
pat = '"Parcels (*-*-*)"'
t= '         <Layer DisplayName="Parcels (7-1-2010)" FeatureDescription="Owner Name: {OWNER_NAME}&lt;br/&gt;Property Address: {PROP_ADDR}&lt;br/&gt;Tax Name: {TAX_NAME}&lt;br/&gt;Tax Address 1: {TAX_ADD_L1}&lt;br/&gt;Tax Address 2: {TAX_ADD_L2}&lt;br/&gt;Land Use: {USE1_DESC}&lt;br/&gt;&lt;a href=&quot;http://www16.co.hennepin.mn.us/pins/pidresult.jsp?pid={PID_NO}&quot;&gt;View Property Information&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;" FeatureLabel="Parcel ID: {PID_NO}" IconUri="{RestVirtualDirectoryUrl}/Images/Parcel.png" Identifiable="true" IncludeInLayerList="true" IncludeInLegend="true" Name="Parcels" Searchable="true" ShowMapTips="true" UnconfiguredFieldsSearchable="true" UnconfiguredFieldsVisible="true" Visible="true">'
match = re.search(pat, t)
print match

Most of the line is junk I don't need to worry about. I just need to see how I can find that date in the line so I can use just that piece in the replace() function. Does anyone know how I could find these dates? There may be other dates in the xml somewhere, but I don't need to replace these; just where it says "Parcels (some-date-year)". I appreciate any help! Thanks!

The pattern needs to be a regular expression. http://docs.python.org/2/howto/regex.html may help you? — Wooble, Jan 30 '14 at 16:11
Did you get an error from this? Is this relevant? http://stackoverflow.com/questions/3675144/regex-error-nothing-to-repeat — doctorlove, Jan 30 '14 at 16:15

Russia Must Remove Putin · Accepted Answer · 2014-01-30T16:34:51.363

import re

t= '         <Layer DisplayName="Parcels (7-1-2010)" FeatureDescription="Owner Name: {OWNER_NAME}&lt;br/&gt;Property Address: {PROP_ADDR}&lt;br/&gt;Tax Name: {TAX_NAME}&lt;br/&gt;Tax Address 1: {TAX_ADD_L1}&lt;br/&gt;Tax Address 2: {TAX_ADD_L2}&lt;br/&gt;Land Use: {USE1_DESC}&lt;br/&gt;&lt;a href=&quot;http://www16.co.hennepin.mn.us/pins/pidresult.jsp?pid={PID_NO}&quot;&gt;View Property Information&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;br/&gt;" FeatureLabel="Parcel ID: {PID_NO}" IconUri="{RestVirtualDirectoryUrl}/Images/Parcel.png" Identifiable="true" IncludeInLayerList="true" IncludeInLegend="true" Name="Parcels" Searchable="true" ShowMapTips="true" UnconfiguredFieldsSearchable="true" UnconfiguredFieldsVisible="true" Visible="true">'

You need to escape the parens and then you can be more specific as to the contents, the generic character is ., and the * means 0 or more:

pat = '"Parcels \(.*\)"'
match = re.search(pat, t)
print(match.group())

Which prints:

"Parcels (7-1-2010)"

a more specific pattern would be:

pat = '"Parcels \([0-9]+-[0-9]+-[0-9]+\)"'
match = re.search(pat, t)
print(match.group())

Which prints:

"Parcels (7-1-2010)"

Here, the bracket contents ([0-9]) unitarily describe all the numbers from 0 to 9 (\d would be equivalent), the plus, +, following them means more than 0, and the dash means itself.

Thanks Aaron! The re module's help is a little confusing. I need to do some more reading there. That did the trick though. — crmackey, Jan 30 '14 at 16:27
Thanks for the second option as well. This makes more sense to me than the first, and would probably be better at flagging the numeric characters. Thanks again! — crmackey, Jan 30 '14 at 16:29
I created a quick ref card based on the help with a small demo at the end. I should probably publish it. It basically lists the special characters, special sequences, module functions, flags, help on functions, (fairly complete descriptions) and a small Verbose example at the end of my own. — Russia Must Remove Putin, Jan 30 '14 at 16:30
Please do! I would like to see it; it would probably be easier to understand than the online help. Please post the location if you choose to publish. — crmackey, Jan 30 '14 at 17:00

score 1 · Answer 2 · answered Jan 30 '14 at 16:31

Aaron's answer is good, just a little modification to match what it looks like you wanted (matched the data format specified)

import re

the_string = '<Layer DisplayName="Parcels (7-1-2010)" ... blablabla '
pattern = r'Parcels \(.*-.*-.*\)'
match = re.search(pattern, the_string)
print match.group()

Also, if you suspect the string may have more than 1 match, you could print all of the matches using the findall method. I've also used the \d+ regex, which matches only digits in the string

import re

the_string = '<Layer DisplayName="Parcels (7-1-2011)" ... blablabla ... Layer DisplayName="Parcels (7-1-2012)" '
pattern = r'Parcels \(\d+-\d+-\d+\)'
all_matches = re.findall(pattern, the_string)
for match in all_matches:
  print match

Python look for pattern in string

2 Answers2