Regular expression return one result on two

Question

I try to retrieve result from a website but the regular expression I wrote doesn't want to return all result I have result 1,3,5... but never 2,4,6...

This is a text sample:

<tr>
<td style="background-color:white">Inter en attente de cloture : </td>
<td style="background-color:red">depuis +2H</td>
<td style="background-color:#FF7F00">depuis -2H</td>
</tr>
</table>
<table class="tab_script">
<tr>
<td>N° commande</td>
<td>Nom</td>
<td>Prenom</td>
<td>N° Mobile</td>
<td>N° Fixe</td>
<td>Ville</td>
<td>Code Postal</td>
<td>Num. Intervention</td>
<td>date rdv</td>
</tr>

<tr bgcolor="#E5E5E5">
<form method="POST">
<td></td>
<td>NOM 1</td>
<td></td>
<td>0600000000</td>
<td>0400000000</td>
<td>VILLE</td>
<td>12345</td>
<td><a  href="index.php?id=13&statut=2&id_inter=123271915">123271915</a></td>
<td style="background-color:red">23/11/2012 08:30</td>
</tr>
</form>

<tr bgcolor="#FFFFFF">
<form method="POST">
<td></td>
<td>NOM 2</td>
<td></td>
<td>0600000000</td>
<td>0400000000</td>
<td>VILLE</td>
<td>54321</td>
<td><a  href="index.php?id=13&statut=2&id_inter=130680172">130680172</a></td>
<td style="background-color:red">09/03/2013 18:30</td>
</tr>
</form>

<tr bgcolor="#E5E5E5">
<form method="POST">
<td></td>
<td>NOM 3</td>
<td></td>
<td>0600000000</td>
<td>0400000000</td>
<td>VILLE</td>
<td>12345</td>
<td><a  href="index.php?id=13&statut=2&id_inter=123271915">123271915</a></td>
<td style="background-color:red">23/11/2012 08:30</td>
</tr>
</form>

<tr bgcolor="#FFFFFF">
<form method="POST">
<td></td>
<td>NOM 4</td>
<td></td>
<td>0600000000</td>
<td>0400000000</td>
<td>VILLE</td>
<td>54321</td>
<td><a  href="index.php?id=13&statut=2&id_inter=130680172">130680172</a></td>
<td style="background-color:red">09/03/2013 18:30</td>
</tr>
</form>

And my regular expression:

$preg='#<tr.*?>.*?';
$preg.='<form.*?>.*?';
$preg.='<td>(.*?)</td>.*?';
$preg.='<td>(.*?)</td>.*?';
$preg.='<td>(.*?)</td>.*?';
$preg.='<td>(.*?)</td>.*?';
$preg.='<td>(.*?)</td>.*?';
$preg.='<td>(.*?)</td>.*?';
$preg.='<td>(.*?)</td>.*?';
$preg.='<td>(.*?)</td>.*?';
$preg.='<td>(.*?)</td>.*?';
$preg.='#ism';
preg_match_all($preg,$response,$match);

And the result:

I just saw that if I copy paste the first part I want to find several time it will be find by the preg also the problem is not my regular expression but the text itselft however I don't see any difference... — Entretoize, Apr 27 '15 at 16:25
Look into the PHP class DomDocument. There are too many problems with parsing HTML with regular expressions, and many posts on SO about it, here is an exhaustive example: http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php — John McMahon, Apr 27 '15 at 16:30
I would like it but I need the code quick and I don't know how it works, how would you do get the data in my example please ? — Entretoize, Apr 27 '15 at 16:41
What he says can be done with regex. I simply believe it is not the quickest nor the simplest of the solutions — ColOfAbRiX, Apr 27 '15 at 16:48
Instead of all those `.*` I would use more specific character sets like `[^>]*` or `[^<]*` — ColOfAbRiX, Apr 27 '15 at 16:51

score 0 · Answer 1 · answered Apr 27 '15 at 16:56

It seems you know the format is consistent across all the file.

I suggest you to yo a line-by-line search and store the data you need.

An even better solution would be to use PHP SAX which does exactly that but for XML:

The code of the link will call startElement every time it finds an opening tag and endElement for every closing tag. As you know your structure it is very easy to find the data you need

Regular expression return one result on two

1 Answers1