0

On one site there is data in form of table. I get its source code like this

<tbody>
    <tr>
        <td></td>
        <td><a href="http://www.altassets.net/ventureforum/" target="_blank">AltAssets Venture Forum</a></td>
        <td>27 March 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Limited Partner Summit</td>
        <td>3-4 June 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Limited Partner Summit</td>
        <td>3-4 June 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>LP-GP Forum: Infrastructure &amp; Real Estate</td>
        <td>7 October 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>Envirotech &amp; Clean Energy Investor Summit</td>
        <td>4-5 November 2014</td>
        <td>London, UK</td>
    </tr>
    <tr>
        <td></td>
        <td>AltAssets Fundraising &amp; IR Forum</td>
        <td>9 December 2014</td>
        <td>Hong Kong</td>
    </tr>
</tbody>

IS it possible to write regex which gives event, date, city separately?

  • 1
    Why don't you use a [real HTML parser](http://stackoverflow.com/a/1732454/782822)? – TimWolla Feb 20 '14 at 18:19
  • 1
    What do you mean by `gives event, date, city separately`? They are already in seperate `` tags . . . how do you want them to be separated more? – talemyn Feb 20 '14 at 18:19
  • @talemyn: I want to extract each of them and store in different var named event, data, city. I could not figure it out –  Feb 20 '14 at 18:28

2 Answers2

1

You should be able to use: <td>.+?</td>

Paul Way
  • 1,966
  • 1
  • 13
  • 10
  • thanks Paul. Means it's possible. can you please explore to get each `td` value seperately for each `tr` –  Feb 20 '14 at 18:13
  • Well, to TimWolla's point above... parsing HTML with RegEx isn't the best tool. If you are, I'd use two loops. The outer loop would be for the rows .+? and the inner loop would use the td. – Paul Way Feb 20 '14 at 18:28
  • I could not understand the structure of regex here. Can you please give clue by giving loop here –  Feb 20 '14 at 18:30
  • `preg_match_all("/(?<=).+?(?=)/", $source_string, $matches)` will grab just the values. Since your HTML is very well structured you could map the results array to a new array whose elements each contain an array of three elements (keys being event, date, city). – tenub Feb 20 '14 at 18:54
1
$matches = array();
preg_match_all("/<tr>(.*)<\/tr>/sU", $s, $matches);
$trs = $matches[1];
$td_matches = array();
foreach ($trs as $tr) {
    $tdmatch = array();
    preg_match_all("/<td>(.*)<\/td>/sU", $tr, $tdmatch);
    $td_matches[] = $tdmatch[1];
}
print_r($td_matches);

Put your string in $s. $td_matches contains a nested array with all TD-contents separated by each TR.

ntaso
  • 614
  • 6
  • 12