I'm building a system where I'll need to grab the contents of a web page with PHP and then parse it to extract certain tables etc. Is there an easy way to do this with jQuery or would the best way be to write PHP function to extract the data?
Asked
Active
Viewed 1,127 times
3
-
This may help, too: http://stackoverflow.com/questions/292926/ – mdo Nov 03 '10 at 20:54
6 Answers
7
jQuery has nothing to do with PHP and can't be run without a browser, so you're out of luck there.
However, there is phpQuery that allows DOM parsing with jQuery's selectors!

Pekka
- 442,112
- 142
- 972
- 1,088
-
2Actually jQuery can be run without a browser (Rhino, V8, etc)... but that's beside the point. It's just a small addendum. – Frankie Nov 03 '10 at 20:09
4
Do It like this in php with native php DOM functions and xpath:
$dom = new DOMDocument();
@$dom->loadHTML($html);
$x = new DOMXPath($dom);
// grab all tables with id of foo
foreach($x->query("//table[@id='foo']") as $node)
{
// here is the html
echo $node->c14n();
// grab the containing text
echo $node->textContent()
}

Byron Whitlock
- 52,691
- 28
- 123
- 168
1
You can use the DOM functions available in PHP http://php.net/manual/en/book.dom.php

Jochen
- 1,853
- 3
- 20
- 28
1
You can't. jQuery is for JavaScript, which is client-side, and requires a JavaScript engine to execute.
I would suggest you read the HTML as XML, but you'll run into all sorts of trouble if the HTML is not XHTML valid.

John Giotta
- 16,432
- 7
- 52
- 82
0
this is awesome
http://sourceforge.net/projects/simplehtmldom/
example:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';

vertazzar
- 1,053
- 7
- 10
-
Don't do this. I used to use simplehtmldom all the time, but it is slow as molasses. Use the built in dom functions. They are an order of magnitude faster. Here is a benchmark to prove it: http://whitlock.ath.cx/FastCrawl/benchmark.php – Byron Whitlock Nov 03 '10 at 20:08
-
can agree but i had some issues with DOM not want to parse sometimes due encoding problem (anyways the file was UTF-8).. at least on my localhost – vertazzar Nov 03 '10 at 20:19
0
There are a few php packages that can help you with this, curl, dom and xpath.
Here's a good tutorial I've used before.

Parris Varney
- 11,320
- 12
- 47
- 76