I have got HTML source code, and i must get some information text in the HTML. I can not use DOM, because the document isn't well-formed.
Maybe, the source could change later, I can not be aware of this situation. So, the solution of this problem must be advisible for most situation.
Im getting source with curl, and i will edit it with preg_match_all function and regular expressions.
Source :
...
<TR Class="Head1">
<TD width="15%"><font size="12">Name</font></TD>
<TD>: </TD>
<TD align="center"><font color="red">Alex</font></TD>
<TD width="25%"><b>Job</b></TD>
<TD>: </B></TD>
<TD align="center" width="25%"><font color="red">Doctor</font></TD>
</TR>
...
...
<TR Class="Head2">
<TD width="15%" align="left">Age</B></TD>
<TD>: </TD>
<TD align="center"><font color="red">32</font></TD>
<TD width="15%"><font size="10">data</TD></font>
<TD> </B></TD>
<TD width="40%"> </TD>
</TR>
...
As we have seen, the source is not well-formed. In fact, terrible! But there is nothing I can do. The source is longer than this.
How can I get the data from the source? I can delete all of HTML codes, but how can i know sequence of data? What can I do with preg_match_all and regex? What else can I do?
Im waiting for your help.