I have, in Python:
links = re.match(r'''<A HREF="(\w+?\.htm)#\w*?">''', workbench)
'workbench' is a file read into memory with line separators replaced by spaces; one such file is at: http://pastebin.com/a0LHKXcS
There are some links that don't interest me; they all have lowercase 'a' or 'href'. So far as I can construct, when matched against the file in the pastebin, I should be getting a lot of matches. But so far the re.match() is returning None and not a populated MatchObject I can pull for data. I tried on the command line and cut the regular expression down to be more tolerant of differences, and a search for HREF didn't find anything.
How can I adjust the regular expression (or other factors) so the call gets a populated MatchObject?
Thanks