I'm trying to extract some urls in an html file using python. Here is what the text look like:
preabc!precde<preefg<
I want to extract "cde" and "efg". The pattern I've used:
pre(.*?)<
pre(.(?!^pre)).*?<
However, none of them works:(. Note that real lengths of "cde" and "efg" are unknow. I'm not familier with regular expression so please explan your answers. Many thanks.
EDIT:
Sorry for my bad explanation and ambiguous example. I want to extract titles like "GIRL FRIENDS" with certain date (2014-7-31 in this case):
<a href="http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid=662128&extra=page%3D1" onclick="atarget(this)" class="s xst">GIRL FRIENDS</a>
<span class="tps"> ...<a href="http://rs.xidian.edu.cn/forum.php?mod=viewthread&tid=662128&extra=page%3D1&page=2">2</a></span>
<a href="http://rs.xidian.edu.cn/forum.php?mod=redirect&tid=662128&goto=lastpost#lastpost" class="xi1">New</a>
</th>
<td class="by">
<cite>
<a href="http://rs.xidian.edu.cn/home.php?mod=space&uid=265770" c="1">机器人</a></cite>
<em><span><span title="2014-7-31">昨天 23:55</span></span></em>
</td>