I have several hundred HTML files (Pidgin IM log files) that have exactly the same format:
<html>
<head><meta ...><title>...</title></head>
<body>
<h3>...</h3>
<font color=...><font ...>(TIME)</font> <b>(NAME):</b></font> (MESSAGE)<br/>
<font color=...><font ...>(TIME)</font> <b>(NAME):</b></font> (MESSAGE)<br/>
<font color=...><font ...>(TIME)</font> <b>(NAME):</b></font> (MESSAGE)<br/>
...
(no closing body/html tags, it just repeats those lines until EOF)
I need to extract the time, name and messages from these files. I'm not great with regex and the HTML libraries I've tried seem a bit complex for what I'm trying to do. Any suggestions?