PHP CURL - scraping xml data that is returned as HTML

Question

Possible Duplicate:
Best XML Parser for PHP

I am a newbie to PHP and cURL, so please give simple steps! :)

I am trying to scrape data from a website that is returning XML data as HTML.

cURL retrieves the response as '5814 3300' instead of the source

<?xml version="1.0" encoding="iso-8859-1"?><app><info><bookID>58</bookID><firstbook><t>14 </t><status>3</status></firstbook><nextbook><t>30</t><status>0</status></nextbook></info></app>

which I need (so I can do preg_match on the results)

What can I do to transform the '5814 3300' output into the XML that I need? Thanks!

PLEASE NOTE: This question was asked by me in a confused state. cURL does indeed output the source.

can you tell me why i cannot use cURL to scrape XML? my understanding of this is not very deep - thanks! — ryanswj, Jun 20 '11 at 15:36
you *can* use cURL for that. But you *should not*. Unless `allow_url_fopen` is disabled on your host's php.ini, any of the XML/HTML parsers mentioned above can load the URI directly and they provide much more control over the markup than any Regex would do because XML/HTML parsers actually understand markup rules, while Regex have to be taught these rules first (and that's tedious). — Gordon, Jun 20 '11 at 15:41
I see. This is why regex is not picking up anything at all. Could you point me to a really simple tutorial to scrape XML? I've searched around and I've seen XML scraping tutorials but they use the 'foreach' code, and they seem excessively over-complicated. Ultimately, what I want to do is just extract the value between the and tags in 14 — ryanswj, Jun 20 '11 at 15:54
there is lots of examples in answers I have given. See http://stackoverflow.com/search?q=user%3A208809+dom+html — Gordon, Jun 20 '11 at 16:00

score 1 · Accepted Answer · answered Jun 20 '11 at 15:04

1

I bet if you looked at the actual source (not what is being rendered on screen) you would see the full XML representation.

answered Jun 20 '11 at 15:04

John Cartwright

5,109
22
25

you're right; I'm confused. sorry to take your time! – ryanswj Jun 20 '11 at 15:42
Don't forget to accept answers when you find one that resolves your issue the best. :) – John Cartwright Jun 20 '11 at 16:02

score 0 · Answer 2 · answered Jun 20 '11 at 15:05

0

Are you outputting that XML to your browser? If you're outputting an HTML content-type, the browser will skip all those unknown tags and simply show their contents. If you view the page source, you'll most likely see the actual XML.

answered Jun 20 '11 at 15:05

Marc B

356,200
43
426
500

you're right; I'm confused. sorry to take your time! – ryanswj Jun 20 '11 at 15:41

PHP CURL - scraping xml data that is returned as HTML

2 Answers2