I am trying to use regex to parse a site for
blahblahblah
<a href="THIS IS WHAT I WANT" title="NOT THIS">I DONT CARE ABOUT THIS EITHER</a>
blahblahblah
(there are many of these, and I want all of them in some tokenized form). The problem is that "a href" actually has TWO spaces, not just one (there are some that are "a href" with one space that I do NOT want to retrieve), so using LXML has proven to be quite a pain and I do not want to use BeautifulSoup (for other reasons). Does anyone know how I might go about doing this?
Thanks!