0

I'm having problems sending an array to another PHP page. We send an array from one page to another to generate CSV file that has been transformed from XML. So we take a 800mb XML file and transform it down to a 20mb CSV file. There is a lot of information in it that we are removing and it runs for 30 minutes.

Anyway, we are periodically using a function to output the progress of the transformation in the browser with messages:

function outputResults($message) {
    ob_start();
    echo $message . "<br>";
    ob_end_flush();
    ob_flush();
}

$masterArray contains all the information in a associative array we have parsed from the XML.

The array ($masterArray) at the end we send from index.php to another php file called create_CSV_file.php

Originally we used include('create_CSV_file.php') within index.php , but due to the headers used in the CSV file, it was giving us the messages that

Warning: Cannot modify header information - headers already sent

. So we started looking at a solution of pushing the array as below.

echo "<a href='create_CSV_file.php?data=$masterArray'>**** Download CSV file ***</a>";

I keep getting the error message with the above echo :

Notice: Array to string conversion

What is the best method to be able to show echo statements from the server as it is running, then be able to download the result CSV at the end?

  • Don't use `serialize()` for this purpose. Use `http_build_query()`. – Amal Murali Dec 12 '15 at 04:06
  • 1
    It seems like you would be better off tucking the array in the session, since it's already serverside and there's no reason to shove it through the page – pvg Dec 12 '15 at 04:07
  • i would not save data as a session variable, i would store data as a serverside file. that way you can verify data integrity and operation fulfillment. – tony gil Dec 12 '15 at 13:09

1 Answers1

0

Ok, so first of all, using data in a url (GET) has some severe limitations. Older version of IE only supported 4096 byte urls. In addition, some proxies and other software impose their own limits.

I'm sure you've heard this before, but if not.... You should not be running a process that takes more than a couple of seconds (at most!) from a web server. They're not optimised for it. You definitely don't want to be passing megabytes of data to the client just so they can send it back to the server!

How about something like this...

  • User makes a web request (And uploads original data?) to the server
  • Server allocates an ID for the request (random? database?) and creates a file on disk using the ID as a name (tmp directory, or at least outside web root)
  • Server launches a new process (PHP?) to transform the data. As it runs, it can update the database with progress information
  • During this time, the user can check progress by making a sequence of AJAX requests (or just refreshing a page which shows latest status). Lots more control over appearance now
  • When the processing is complete, server-side process writes results to file, updates database to indicate completion.
  • Next time user checks status, redirect them to a PHP file that takes the ID and will read the file from disk / stream it to the user.

Benefits:

  • No long-running http requests
  • No data being passed back/forth to client in intermediate stage
  • Much more control over how users see progress
  • Depending on the tranformation you're applying / the detail stored in the database, you may be able to recover interrupted jobs (server failure)

It does have one downside which is that you need to clean up after yourself - the files you created on disk need to be deleted, however, you've got a complete audit of all files in the database and deleting anything over x days old would be trivial.

Basic
  • 26,321
  • 24
  • 115
  • 201
  • Thanks Basic. You've given me some things to think about. We are running this machine with XAMPP, so its already self contained and it essentially is used just for running this script. The process is all PHP. – user3230730 Dec 13 '15 at 01:05
  • No problem. Don't get me wrong, there's no reason the process to do the transformation shouldn't be PHP and take as long as you like. It's just that it shouldn't be invoked by the webserver to service a request. Holding a connection open for the duration of the processing will cause issues in the webserver (runs out of handles/resources, can't free memory, etc), it's also likely to require you setting outrageous timeout values (making you vulnerable to DoS). It's fragile with some older proxies (*much* less common problem nowadays but still non-zero). Also, it's not a good user experience – Basic Dec 13 '15 at 01:23