-1

Possible Duplicate:
Best XML Parser for PHP

I am a newbie to PHP and cURL, so please give simple steps! :)

I am trying to scrape data from a website that is returning XML data as HTML.

cURL retrieves the response as '5814 3300' instead of the source

<?xml version="1.0" encoding="iso-8859-1"?><app><info><bookID>58</bookID><firstbook><t>14 </t><status>3</status></firstbook><nextbook><t>30</t><status>0</status></nextbook></info></app>

which I need (so I can do preg_match on the results)

What can I do to transform the '5814 3300' output into the XML that I need? Thanks!

PLEASE NOTE: This question was asked by me in a confused state. cURL does indeed output the source.

Community
  • 1
  • 1
ryanswj
  • 65
  • 10
  • can you tell me why i cannot use cURL to scrape XML? my understanding of this is not very deep - thanks! – ryanswj Jun 20 '11 at 15:36
  • you *can* use cURL for that. But you *should not*. Unless `allow_url_fopen` is disabled on your host's php.ini, any of the XML/HTML parsers mentioned above can load the URI directly and they provide much more control over the markup than any Regex would do because XML/HTML parsers actually understand markup rules, while Regex have to be taught these rules first (and that's tedious). – Gordon Jun 20 '11 at 15:41
  • I see. This is why regex is not picking up anything at all. Could you point me to a really simple tutorial to scrape XML? I've searched around and I've seen XML scraping tutorials but they use the 'foreach' code, and they seem excessively over-complicated. Ultimately, what I want to do is just extract the value between the and tags in 14 – ryanswj Jun 20 '11 at 15:54
  • there is lots of examples in answers I have given. See http://stackoverflow.com/search?q=user%3A208809+dom+html – Gordon Jun 20 '11 at 16:00

2 Answers2

1

I bet if you looked at the actual source (not what is being rendered on screen) you would see the full XML representation.

John Cartwright
  • 5,109
  • 22
  • 25
0

Are you outputting that XML to your browser? If you're outputting an HTML content-type, the browser will skip all those unknown tags and simply show their contents. If you view the page source, you'll most likely see the actual XML.

Marc B
  • 356,200
  • 43
  • 426
  • 500