13

I have a long running script in a shared hosting environment that outputs a bunch of XML

Sometimes (only sometimes) a random GZIP header will appear in my output, and the output will be terminated.

For instance

0000000: 3c44 4553 435f 4c4f 4e47 3e3c 215b 4344  <DESC_LONG><![CD
0000010: 4154 415b 1fc2 8b08 0000 0000 0000 03c3  ATA[............
0000020: b3c3 8b57 c388 c38c 2b28 2d51 48c3 8bc3  ...W....+(-QH...
0000030: 8c49 5528 2e48 4dc3 8e4c c38b 4c4d c391  .IU(.HM..L..LM..
0000040: c3a3 0200 c291 4464 c383 1900 0000 0d0a  ......Dd........

or

0000000: 3c2f 5052 4f44 5543 543e 0d0a 1fc2 8b08  </PRODUCT>......
0000010: 0000 0000 0000 03c3 b3c3 8b57 c388 c38c  ...........W....
0000020: 2b28 2d51 48c3 8bc3 8c49 5528 2e48 4dc3  +(-QH....IU(.HM.
0000030: 8e4c c38b 4c4d c391 c3a3 0200 c291 4464  .L..LM........Dd
0000040: c383 1900 0000 0d0a                      ........

or

0000000: 3c4d 4544 4941 5f55 524c 3e2f 696d 6167  <MEDIA_URL>/imag
0000010: 6573 2f69 6d70 6f72 7465 642f 7374 6f63  es/imported/stoc
0000020: 6b5f 7072 6f64 3235 3339 365f 696d 6167  k_prod25396_imag
0000030: 655f 3531 3737 3439 3436 302e 6a70 673c  e_517749460.jpg<
0000040: 2f4d 4544 4941 5f55 1fc2 8b08 0000 0000  /MEDIA_U........
0000050: 0000 03c3 b3c3 8b57 c388 c38c 2b28 2d51  .......W....+(-Q
0000060: 48c3 8bc3 8c49 5528 2e48 4dc3 8e4c c38b  H....IU(.HM..L..
0000070: 4c4d c391 c3a3 0200 c291 4464 c383 1900  LM........Dd....
0000080: 0000 0d0a                                ....

The switch to GZIP does not seem to hit at any particular time og byte count, it can be after 1MB of data or after 15MB

The compiled blade template at the corresponding lines are as follows

<DESC_LONG><![CDATA[<?php echo $product->display_name; ?>]]></DESC_LONG>

-

</PRICES>
</PRODUCT>
<?php foreach($product->models()->get() as $model): ?>

-

<MEDIA_URL>/images/imported/<?php echo $picture->local_name; ?></MEDIA_URL>

I am at my wits end, I have tried the following:

  • Disable gzip on the server.
  • Run while(ob_get_level()){ ob_end_clean(); } before running the script
  • In .htaccess i have tried SetEnv no-gzip 1, SetEnv no-gzip dont-vary and various permutations thereof.

When I visit other pages, no gzip encoding or headers appear, so I'm thinking this is something with the output size or output buffer.

Kristoffer Sall-Storgaard
  • 10,576
  • 5
  • 36
  • 46
  • This may or may not be useful, but I see that the gzip stream has had inserted into it a bunch of extraneous "0xc3" bytes, I think to try to make it look like valid UTF-8. – Mark Adler Feb 04 '14 at 20:33
  • Does it happen if you run in console (CLI)? Or if you wget the script in http://127.0.0.1/ (localhost) ? – Tan Hong Tat Feb 07 '14 at 12:26
  • The script works perfectly when run on my local machine – Kristoffer Sall-Storgaard Feb 07 '14 at 12:28
  • `max_execution_time` is 30, so if you don't extend the execution time, it will terminate after 30s. And gzip is on. If you can post the script url, it can be easier for us to see. – Tan Hong Tat Feb 07 '14 at 12:38
  • 1
    Would it be possible to provide a larger sample of the bogus output, starting from the `1fc2 8b08` bytes? – Álvaro González Feb 07 '14 at 13:10
  • 1
    The output from the system is terminated after `0000 0d0a` – Kristoffer Sall-Storgaard Feb 07 '14 at 13:12
  • @HongTat The `max_execution_time` is altered for my script, it has been changed to `300`, the script terminates with bogus output well before that. – Kristoffer Sall-Storgaard Feb 07 '14 at 13:13
  • When was the last time you updated Laravel? Does it have this patch: https://github.com/laravel/laravel/pull/1586 Browsers may be asking for the Accept-Encoding: gzip, deflate option. – arikin Feb 10 '14 at 02:22
  • have you tried defining doctype while outputing the xml. The browser need doctype for showing the xml in browser. – Viswanath Polaki Feb 10 '14 at 08:54
  • This may be really obvious, but just to be sure, you've checked your system's log files? (dmesg, /var/log/messages, apache log), and tried turning up all debugging options you can find? – Martin Tournoij Feb 10 '14 at 09:15
  • I'm wondering if there is some kind of caching proxy getting in the way? – edmondscommerce Feb 10 '14 at 18:01
  • I noticed that your php.ini is located in /compile/php53/dest/lib which isn't a standard location, and that the php lib that the server uses wasn't compiled using --enable-cli, which makes me suspect your running a different version/configuration when executing via the php command. try the command 'php -i' (or 'php5 -i') to see if the ini for the cli is in a different location. This is common when the cli is installed using a package manager and the cgi is custom built. – JSON Feb 11 '14 at 01:37
  • @ClosetGeek Its on shared hosting, and I have no access to php via the command line – Kristoffer Sall-Storgaard Feb 11 '14 at 07:19
  • I see. You might want to post a link to a copy of the configuration file and phpinfo() of the working box. This is most likely a configuration issue if it works fine on one but not another. If possible, you might also want to post the code that handles this section of the script or a link to the script of its from an open source project. – JSON Feb 11 '14 at 07:58
  • 1
    Randomly appearing headers makes it sound like another request or process is responsible... too bad you're on a shared host - makes issolating and debugging such a problem rather difficult. – AD7six Feb 11 '14 at 20:28
  • Can this be related?: http://stackoverflow.com/questions/4410704/why-would-one-omit-the-close-tag – Gustavo Rubio Feb 14 '14 at 06:18

3 Answers3

1

Did you finally find out where these headers come from? I mean apache or php?

You can simulate xml generator scipt with something like:

echo file_get_contents('your_good_test.xml');

If you won't see any headers, I suggest to debug your xml generator. You can try to call header_remove(); before output.

If you see headers, you have to debug your web server. Try to disable gzip in apache by rewrite rule:

`RewriteRule . - [E=no-gzip:1]`

Whenever you have any proxy or balancer (nginx, squid, haproxy) you automaticly get one more firing line.

Ostin
  • 1,511
  • 1
  • 12
  • 25
1

your gziping is not related to server output that returns your main xml body. Otherwise the whole xml would be compressed.

These methods return GZIP sometimes because the source where these take the items is set to support gzip and are not asked properly.

$product->display_name
$product->models()->get()
$picture->local_name

Look inside these. - Check web calls for all places where headers are set. - temporally disable compression for database connection if any.

Add CDATA tags for all places where binary data could be returned to avoid main xml body building termination. Wait for an xml with bin data, Save bin data, unzip it and look what is inside. :-)

1

This is more of a set of comments, but it is too long for the comment box.

First, this very likely NOT an output buffer issue. Even though <![CDATA[ and ]]> is not within PHP tags this doesn't mean that it doesn't pass through PHP's output buffer. To be clear, anything within a .php file will be placed in the PHP output buffer. The content within a .php file (including static content) is buffered outside of Apache and then passed back to Apache through this buffer when the script is finished. This means that your problem must lie within the code itself, which is a shot in the dark to solve without viewing the code.

My suggestions:

1) do a search within the script to find any instances of gz functions (gzcompress, gzdeflate, gzdecode, etc). I have seen scripts compress content if it was greater than a specific size and then decompress the content on the fly when taken from the DB. If that is the case you are likely dealing with a faulty comparison operation. In short, the logic within compression and decompression conditions is slightly off so it is failing to decompress SOME of the content.

2) do a search within the script to see how this data is fetched. Is it all from a database? Does any of it come from a stream? Is any of it fetched remotely? These questions might not directly lead to an answer but are vital. It can safely be assumed that these variables are being set with data already compressed when it shouldn't be. It requires knowing where/why/how the compression is taking place in order to answer why it is not being decompressed.

3) It matters greatly that it is working as expected on one system but not the other. The only times I have seen this happen was always due to differences in configuration. What operating system was your local machine using? What's the difference in local database (if any), what extensions might be missing/present on one or the other, possibly causing a function to fall back on different procedure on the two different machines.

EDIT: Also, and this is a small chance, but are you dealing with data that originated from an SQL dump from a different server? You said it works on your local host but not on a different host, so we know your dealing with two machines. Was there a third at some point? If so, it might have been compressed using a mismatched version/form of compression, or might be an issue with encoding.

JSON
  • 1,819
  • 20
  • 27