0

I'm trying to grab data from an xml.gz file with curl. I'm able to download the file, but can't get the usable XML with any of my attempts. When I try to print the XML, I'm getting a long list of garbled special characters such as:

‹ì½ûrâÈ–7ú?E~{Çž¨Ši°î—Ù5=ÁÍ6]`Ø€ë²ãDLÈ u

Is there a simple way to just uncompress and encode this xml? Possibly through SimpleXML? The files are large and do require authentication. Here's my current code:

$username='username';
$password='password';
$location='http://www.example.com/file.xml.gz';


$ch = curl_init ();
curl_setopt($ch,CURLOPT_URL,$location);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_USERPWD,"$username:$password");
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_HEADER, 0);

$xmlcontent = curl_exec ($ch); 
curl_close($ch);

print_r($xmlcontent);

Thanks for your help!

dfcode3
  • 809
  • 1
  • 9
  • 15
  • Take care: gzip is _not_ zip, so "unzip" is the wrong approach... – arkascha Jul 07 '13 at 06:29
  • Thanks. My error in the explanation. Corrected now. – dfcode3 Jul 07 '13 at 06:32
  • So what against just decompressing it with the existing php functions? Either `gzdecode` (http://www.php.net/manual/de/function.gzdecode.php) or `gzuncompress`, depending on how the result is actually coded. – arkascha Jul 07 '13 at 06:35

4 Answers4

0

You will need to pass the string through gzuncompress: http://www.php.net/manual/en/function.gzuncompress.php

AMADANON Inc.
  • 5,753
  • 21
  • 31
  • hmm...I'm getting a data error with that function. Not sure if that means the uncompressed file is to large. – dfcode3 Jul 07 '13 at 06:43
0

I suggest you just decompress the result you fetch:

//[...]
$xmlcontent = gzdecode ( curl_exec($ch) ); 
curl_close($ch);
print_r($xmlcontent);

Obviously you should do some additional error checking, this is just the shortened general approach.

Note that there are two similar functions provided by php:

Most likely you have to use the second one, if the file really is a physical gzip compressed file delivered by a http server.

Simon Kjellberg
  • 826
  • 8
  • 17
arkascha
  • 41,620
  • 7
  • 58
  • 90
  • I tried that, but am also getting an undefined function error. I'm guessing I need to install a PHP library for that? I'm on PHP 5, so I would guess it should be default. – dfcode3 Jul 07 '13 at 06:46
  • Most php functions are provided by extensions, not all of those are installed necessarily in the php patterns defined by distributions. Check your software management system and install the "Zlib" php extension (called "php5-zlib" on openSUSE, might be slightly different in other distributions). – arkascha Jul 07 '13 at 06:48
  • Well, I checked the php.ini file, and also did a test on the server. It shows the Zlib installed and active, but still doesn't have the gzdecode functions. Other folks online seem to show hit and miss with that as well. I did find that `gzinflate(substr(curl_exec($ch),10,-8));` will give me the content of the xml, but it seemed to strip the xml format. 'gzread' from this solution [link](http://stackoverflow.com/questions/9768237/php-gzuncompress-with-file-read-and-write-errors) did the same thing. – dfcode3 Jul 07 '13 at 07:41
  • Any thoughts on other ways this might be accomplished or if those solutions could be changed to work? – dfcode3 Jul 07 '13 at 07:42
  • I'd say: don't invest in workarounds, solve the problem. What does "It shows the Zlib installed" mean? How did you test? Also, what php version and what environment do you use? Also check if you have the installation requirements installed: the zlib library. I could imagine php to hide the gzdecode function otherwise... Again: consult your software management system for this, it will suggest what you require. – arkascha Jul 07 '13 at 07:55
0

You first need to save the file to disk. As it's gz-compressed you need to uncompress it before you can access the (uncompressed) XML. This can be done with the zlib:// -- bzip2:// -- zip:// — Compression Streams in PHP:

$file = 'compress.zlib://file.xml.gz';
         ################
$xml  = simplexml_load_file($file);

To get this to work, you need to have the ZLib extension installed/configured.

Wrapper means that you're not creating an uncompressed variant of that file first (create a second file, which can be a solution ,too) but the wrapper uncompresses the data of that file transparently on the fly so that the simplexml library can operate on the uncompressed XML (and that is what that library needs: uncompressed XML).

See as well:

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
0

Not sure why, but none of the other answers worked for me in the end. zlib was installed on the server, but the gzdecode() function was not defined in the library, and the gzuncompress gave me errors, as did compress.zlib://. They might work for you so, give them a try as well.

If you need to check if zlib is installed this stackoverflow answer or this answer can help. They provide this script:

<?php

echo phpversion().", ";

if (function_exists("gzdecode")) {
  echo "gzdecode OK, ";
} else {
  echo "gzdecode no OK, ";
}

if (extension_loaded('zlib')) {
  echo "zlib extension loaded ";
} else {
  echo "zlib extension not loaded ";
}

?>

This site gives another script that shows what zlib function are installed:

var_dump(get_extension_funcs('zlib'));

SOLUTION!!! These 2 functions did the trick for me. Just curl or use file_get_contents to grab the xml file, then use this script:

$xmlcontent = gzinflate(substr($xmlcontent,10,-8));

OR use this script to grab the xml file and get the contents (see more here):

$zd = gzopen($filename,"r");
$contents = gzread($zd,$fileSize);
gzclose($zd);

Thanks to all who helped me get this answer. Hope this helps someone else!

Community
  • 1
  • 1
dfcode3
  • 809
  • 1
  • 9
  • 15