PHP CSS Selector Library?

Question

Is there a PHP class/library that would allow me to query an XHTML document with CSS selectors? I need to scrape some pages for data that is very easily accessible if I could somehow use CSS selectors (jQuery has spoiled me!). Any ideas?

score 44 · Accepted Answer · edited Feb 12 '17 at 09:59

44

After Googling further (initial results weren't very helpful), it seems there is actually a Zend Framework library for this, along with some others:

edited Feb 12 '17 at 09:59

CubicleSoft

2,274
24
20

answered Nov 04 '08 at 02:19

Wilco

32,754
49
128
160

12

+1 phpQuery is absolutely wonderful. – Sampson Jul 17 '09 at 18:36
2

I tried out 3 of the items you listed. In the end, my choice is Simple HTML DOM, purely because they explain it's usage very simply and well put. phpQuery got the job done, but I felt as if there was a lack of documentation and support. Zend successfully grabbed my query and counted it, but when it came to getting the values, it failed. Again, my suggestion is Simple HTML DOM. – NessDan Dec 10 '10 at 02:50
1

Although simple html dom is quite popular, a) it doesn't have good coverage of the full selector syntax b) it doesn't *appear* to be in active development. – Bobby Jack Dec 07 '11 at 11:42
I'm working with phpQuery for now: Zend_Dom_Query probably only helps if you're already using Zend Framework. Simple HTML DOM Parser looks too small. phpQuery looks good, also wraps DOMDocument which I'm already using everywhere in my tests, so it doesn't require reparsing for me. DomQuery has disappeared. pqLite is an option, but uses its own node structure, so requires reparsing the document. – qris Nov 16 '12 at 12:40
Fair warning! pqLite appears to be dead. The one search result I found linked out to a malware site. – CubicleSoft Feb 04 '17 at 19:42

score 9 · Answer 2 · answered Nov 04 '08 at 02:23

9

XPath is a fairly standard way to access XML (and XHTML) nodes, and provides much more precision than CSS.

answered Nov 04 '08 at 02:23

nickf

537,072
198
649
721

+1 to bring to 0, but mainly because alternatives are always good. – eyelidlessness Nov 04 '08 at 07:05
wow, I was downvoted for this? I'm kinda interested as to why... – nickf Nov 04 '08 at 10:36
Wasn't me the OP! :-) I actually think this would be the best alternative since XHTML is just a subset of XML. – Wilco Nov 04 '08 at 16:54
Sometimes people here are rather random. I agreed on XPath being a better tool to use, if it's available. It's standard, more powerful and quite similar to CSS-selectors anyway. – troelskn Nov 05 '08 at 15:01
NickF, there's a nothing more "precise" about XPath... http://ejohn.org/blog/xpath-css-selectors/ There is one more option for selection, which is nice, but the CSS selectors are a lot cleaner, and understood by a wider audience. – cgp Apr 04 '09 at 18:37
See also: http://plasmasturm.org/log/444/ – cgp Apr 04 '09 at 18:39
1

In CSS you couldn't do anything like "select the parent of a 'strong' tag" – nickf Apr 05 '09 at 05:43
"Gee, why did I get downvoted when I didn't answer the question". You could have also said "Why are you using PHP? Ruby is much better!" – Sam Minnée Oct 29 '12 at 00:56
@SamMinnée really needed to bring up a 4 year old answer? Anyway, the question was about how to query an XML document. XPath would do exactly what was needed, so it's a valid answer IMO. – nickf Oct 29 '12 at 09:17
@nickf The OP specifically asked for CSS selectors, not XPath or anything else. So you didn't answer his question. Perhaps a downvote is unfair, but not inexplicable. Also there's nothing wrong with 4 year old answers. I found this question right now and it's still as relevant as it was when posted. Nothing wrong with keeping it alive. – qris Nov 16 '12 at 12:17

score 6 · Answer 3 · answered Jun 20 '10 at 00:58

6

Another one:
http://querypath.org/

answered Jun 20 '10 at 00:58

mario

144,265
20
237
291

Looks better than all the other options, to me - thanks! – Bobby Jack Dec 07 '11 at 12:06

score 6 · Answer 4 · edited Dec 14 '11 at 11:42

6

A great one is a component of symfony 2, CssSelector\Parser^Introduction. It converts CSS selectors into XPath expressions. Take a look =)

Source code

edited Dec 14 '11 at 11:42

hakre

193,403
52
435
836

answered Jul 12 '10 at 09:13

Clement Herreman

10,274
4
35
57

score 5 · Answer 5 · answered Jan 22 '09 at 16:00

For jQuery users most interesting may be port of jQuery to PHP, which is phpQuery. Almost all sections of the library are ported. Additionally it contains WebBrowser plugin, which can be used for Web Scraping whole site's path/processes (eg accessing data available after logging in). It simply simulates web browser on the server (events and cookies too). Latest versions has experimental support for XML namespaces and CSS3 "|" selector.

score 3 · Answer 6 · answered Feb 08 '11 at 19:08

3

I ended up using PHP Query Lite, it's very simple and has all I need.

answered Feb 08 '11 at 19:08

Mirko

5,207
2
37
33

Downvoted because this doesn't appear to exist any more. – Richard Jan 18 '17 at 12:09

score 2 · Answer 7 · answered Nov 19 '08 at 06:22

2

For document parsing I use DOM. This can quite easily solve your problem if you know the tag name (in this example "div"):

 $doc = new DOMDocument();
 $doc->loadHTML($html);

 $elements = $doc->getElementsByTagName("div");
 foreach ($elements as $e){
  if ($e->getAttribute("class")!="someclass") continue;

  //its a div.classname
 }

Not sure if DOM lets you get all elements of a document at once... you might have to do a tree traversal.

answered Nov 19 '08 at 06:22

This method is the fastest of all I've tested. Another to consider is [SmartDOMDocument](http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/) – jaggedsoft Nov 08 '15 at 05:32

score 1 · Answer 8 · answered Jul 28 '09 at 14:27

1

I wrote mine, based on Mootools CSS selector engine http://selectors.svn.exyks.org/. it rely on simplexml extension ability (so, it's read-only)

answered Jul 28 '09 at 14:27

131

3,071
31
32

PHP CSS Selector Library?

8 Answers8

Linked