how can I find all span's with a class of 'blue'
that contain text in the format:
04/18/13 7:29pm
which could therefore be:
04/18/13 7:29pm
or:
Posted on 04/18/13 7:29pm
in terms of constructing the logic to do this, this is what i have got so far:
new_content = original_content.find_all('span', {'class' : 'blue'}) # using beautiful soup's find_all
pattern = re.compile('<span class=\"blue\">[data in the format 04/18/13 7:29pm]</span>') # using re
for _ in new_content:
result = re.findall(pattern, _)
print result
I've been referring to https://stackoverflow.com/a/7732827 and https://stackoverflow.com/a/12229134 to try and figure out a way to do this, but the above is all i have got so far.
edit:
to clarify the scenario, there are span's with:
<span class="blue">here is a lot of text that i don't need</span>
and
<span class="blue">this is the span i need because it contains 04/18/13 7:29pm</span>
and note i only need 04/18/13 7:29pm
not the rest of the content.
edit 2:
I also tried:
pattern = re.compile('<span class="blue">.*?(\d\d/\d\d/\d\d \d\d?:\d\d\w\w)</span>')
for _ in new_content:
result = re.findall(pattern, _)
print result
and got error:
'TypeError: expected string or buffer'