I've been putting together a list of pages that we need to update with new content (we're switching media formats). In the process I'm cataloging pages that correctly have the new content.
Here's the general idea of what I'm doing:
- Iterate through a file structure and get a list of files
- For each file read to a buffer and, using regex search, match a specific tag
- If matched, test 2 more regex matches
- write the resulting matches (one or the other) into a database
Everything works fine up until the 3rd regex pattern match, where I get the following:
'NoneType' object has no attribute 'group'
# only interested in embeded content
pattern = "(<embed .*?</embed>)"
# matches content pointing to our old root
pattern2 = 'data="(http://.*?/media/.*?")'
# matches content pointing to our new root
pattern3 = 'data="(http://.*?/content/.*?")'
matches = re.findall(pattern, filebuffer)
for match in matches:
if len(match) > 0:
urla = re.search(pattern2, match)
if urla.group(1) is not None:
print filename, urla.group(1)
urlb = re.search(pattern3, match)
if urlb.group(1) is not None:
print filename, urlb.group(1)
thank you.