2

I want to get data from this URL: http://livingsocial.com/cities.atom. Each time I hit this URL the browser get stuck. I tried to hit it directly, through curl, and by file_get_contents() but the result is same.

This URL sends a huge Xml which I have to get and collect the desired information from it and save it in database.

Please help me in accomplishing this task or at least tell me how to get this XML?

hakre
  • 193,403
  • 52
  • 435
  • 836

3 Answers3

1

once i face the same problem.. to get the file contents of this URL open in chrome and after 1 or 2 second stop it.. it will show the structure of the xml.. complete the last 1 or 2 tags and enjoy.. i am pasting the structure here..

<?xml version="1.0"?>
  <feed xmlns:ls="http://livingsocial.com/ns/1.0" xmlns="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss" xml:lang="en-US">
  <title>LivingSocial Deals</title>
  <updated>2013-03-12T00:49:21-04:00</updated>
  <id>tag:livingsocial.com,2005:/cities.atom</id>
  <link rel="alternate" type="text/html" href="http://www.livingsocial.com/"/>
  <link rel="self" type="application/atom+xml" href="http://www.livingsocial.com/cities.atom"/>
    <entry>
      <id></id>
      <published></published>
      <updated></updated>
      <link type="text/html" href="http://www.livingsocial.com/cities/1759-sacramento-citywide/deals/620554-set-of-two-organic-yoga-leggings" rel="alternate"/>
      <title></title>
      <long_title></long_title>
      <deal_type></deal_type>
      <merchandise_type></merchandise_type>
      <market_id></market_id>
      <market_name></market_name>
      <georss:point></georss:point>
      <georss:featureTypeTag>city</georss:featureTypeTag>
      <country_code>US</country_code>
      <subtitle></subtitle>
      <offer_ends_at></offer_ends_at>
      <price></price>
      <value></value>
      <savings></savings>
      <orders_count></orders_count>
      <merchant_name></merchant_name>
      <image_url></image_url>
      <categories></categories>
      <sold_out></sold_out>
      <national></national>
      <description></description>
      <details></details>
      <content type="html"></content>
      <ls:merchant></ls:merchant>
      <author>
        <name></name>
      </author>
    </entry>
  </feed>
</xml>
Muhammad Tahir
  • 2,351
  • 29
  • 25
0

I can't manage to even load the file on my browser, so my guess is that it is excessively large and you should try to limit the amount you have to load somehow (are there parameters which lets you specify only one city?) But, if that is not an option, the first example here has a class which should do roughly what you're looking for. Only be sure to pass a URL instead of the contents of the CURL request.

cwallenpoole
  • 79,954
  • 26
  • 128
  • 166
  • No this solution does not work at all, rather it flags the following error message: Error: Cannot open hhttp://livingsocial.com/cities.atom – Ahsan Habib Mar 11 '13 at 19:46
0

The URL http://www.livingsocial.com/cities.atom is just large (94 354 882 bytes, that is roughly 90 MB) and takes its time to load (here 33 seconds).

As this is a remote resource you can not change that.

However if you store that feed to disk (cache it) you can reduce the time loading the file into Simplexml or DOMDocument to ca. 1.5 seconds.

// Store URL to disk (takes ca. 33 seconds)
$url = 'http://www.livingsocial.com/cities.atom';
$out = 'cities.atom.xml';
$fh  = fopen($url, 'r');
$r   = file_put_contents($out, $fh);
fclose($fh);

If that still is too slow you not only need to cache the remote-file but also the parsing.

hakre
  • 193,403
  • 52
  • 435
  • 836