0

I'm currently in need of a way to grab certain elements from a different site, and replace the contents of them with my own content using PHP.

If a website has a page that contains the following table and div

    <div>Div Contents!</div>

    <table>
      <tr>
        <td>Table Column 1</td>
        <td>Table Column 2</td>
      </tr>
    </table>

I need to be able to grab this information, and replace "Div Contents!", "Table Column 1", and "Table Column 2" with my own data.

What would be the best way to do this? Regular expressions or strpos/str_replace/substr etc.

I appreciate any help and examples you provide.

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
user826855
  • 578
  • 1
  • 7
  • 18

3 Answers3

1

Use cURL to grab the HTML content form a remote source, use regex (preg_match()) or a series of string operations to extract the data you want, output your data in the desirable format from the variable assigned during parsing.

Ideally regex will be a lot faster to build and test, but strpos/substr combo can also do the trick.

*I've build data mining programs before

Matt Lo
  • 5,442
  • 1
  • 21
  • 21
  • who says it was :\ this was a long time ago. But you need to have the experience to know what the fastest way is, not just assume in theory. – Matt Lo Jan 22 '12 at 21:44
  • Well, both theory and practice say that one shouldn't use regex with HTML :) – Christian Jan 22 '12 at 21:49
  • In practice, simpleXMLElement() is pretty good iterating through elements and even for this case, but when its about traversing, you can optimize a series of regex methods to gain faster results (even if its only 4/10ths faster). In the past I know regex was faster to deploy if you knew how to write it effectively the first time around. Efficiency in data mining (when I was doing it) wasn't important enough for the client if the results weren't greatly improved in quality. However I do agree regex in large strings can be troublesome. – Matt Lo Jan 22 '12 at 21:57
0
  1. Fetch the other page's HTML with cURL
  2. Parse and modify - See this question. Most likely you will want to use a native PHP library like XMLReader or DOM.
  3. Display resulting HTML
Community
  • 1
  • 1
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
0
// read URL into dom document
$doc = domxml_open_file('http://domain.com/test.php');
// replace content of div
$els = $doc->getElementsByTagName('div');
$els[0]->set_content('new content');
// replace content of tds
$els = $doc->getElementsByTagName('td');
$els[0]->set_content('new content');
$els[1]->set_content('new content');
// echo the final output
echo $doc->saveXML();

Notes

  • The above code should work with URLs directly, without having to use CURL.
  • I used a direct approach to changing values (assumed the structure you mentioned is true). You should use a loop instead.
Christian
  • 27,509
  • 17
  • 111
  • 155