4

So, I have this database with 435,453 rows of data. And I have 271 different companies that need specific information from it, so I am currently trying to aggregate it, by selecting all the data with a MySQL query, which returns all rows in 1.28 seconds, fine.

Then I want to traverse them and write them to files (using fputcsv()) according to each company. So I start by using fopen() 271 times to create the files, and then for each returned row I see which companies that will have data in that row and write to the files accordingly.

Now, using memory_usage() I can see that memory usage is on a steady 6Mb during the entire process. And by using microtime() I am timing each iteration of the while() that is me reading from the DB.

And the end result is that each iteration takes about 0.00001 seconds, so for each row in the DB, it takes 0.00001 seconds to determine which companies should have each row and writing it to those files.

But something is messed up, because after 100 minutes, the process is still not done, and the while() advances about 100 rows every second. And if my math is correct, traversing 435,453 rows where each takes 0.00001 seconds should take about 4 seconds.

Here is the code, or the part that is taking so long:

$q=mysql_query(bq($query)) or print mysql_error();
while ($r=mysql_fetch_assoc($q)){
    $nr++;
    $now = microtime(true);
    if (($nr % 100) == 0 && $argv[1] == "debug"){
        print "$nr: " . fsize(memory_get_usage()) . ", time: " . sprintf("%.5f", ($now - $savetime)) . "\n";
    }
    foreach ($xsps as $xsp => $sites){
        if (!in_array($r["citynet"], $sites)) continue 1;
        $data = format_fields($r, $xsp);
        $values = array_values($data);
        $keys = array_keys($data);
        $linefeed = "\n";
        # TXT
        if ($nr == 1) fwrite($text[$xsp], join("\t", $keys) . $linefeed);
        fputcsv($text[$xsp], $values, "\t");
        # CSV
        if ($nr == 1) fwrite($csv[$xsp], join(";", $keys) . $linefeed);
        fputcsv($csv[$xsp], $values, ";");
    }
    $savetime = microtime(true);
}

And the output, which prints every 100 rows, looks something like this:

12600: 6 Mb, time: 0.00000
12700: 6 Mb, time: 0.00000
12800: 6 Mb, time: 0.00000
12900: 6 Mb, time: 0.00001
13000: 6 Mb, time: 0.00000
13100: 6 Mb, time: 0.00000
13200: 6 Mb, time: 0.00000

So, obviously - what am I doing wrong? How can it take so long when the numbers says it doesn't?

EDIT So, obviously I had the way I calculated this wrong, so I edited this to:

while ($r=mysql_fetch_assoc($q)){
    $nr++;
    $start = microtime(true);
    if (($nr % 100) == 0 && $argv[1] == "debug"){
        print "$nr: " . fsize(memory_get_usage()) . ", time: " . sprintf("%.5f", $processtime) . "\n";
    }
    ...
    $stop = microtime(true);
    $processtime = ($stop - $start);
}

And now it reports that each row takes 0.15 seconds to complete, which means the entire process takes 108 minutes. So the actual question is now - why is it so slow? Is it fputcsv() that is slow or is it PHP that is slow?

Sandman
  • 2,323
  • 5
  • 28
  • 34
  • The timing looks incorrect. After the foreach loop, you store the time, and then return to the beginning of the while loop and get the time again. `$now` is calculated almost immediately after `$savetime`. I think you should be calculating `$savetime` when `$nr % 100 == 0` and `$savetime` should go at the beginning, and the `$now` time needs to go after the foreach, and calculate the difference then. – drew010 Jun 27 '16 at 15:43
  • @Ctc No, this is run from the command line – Sandman Jun 27 '16 at 15:44
  • @sandman My bad, didnt really read it properly. Let me read again. – Ctc Jun 27 '16 at 15:45
  • @drew010 You're obviously correct, only the solution is to calculate the offset within one loop - so one $start at the beginning and then a $stop at the end and then a $calculate = $stop - $start. This ends up with each row taking 0.015 seconds, which means the entire process would take 108 minutes. I'll edit the question and ask why it is taking so long – Sandman Jun 27 '16 at 15:48
  • 1
    Try storing results/lines in temporary variables and dump them to files at the end of the loop or once in a while ($nr % 100 or 1000) and see how execution time changes. Write operations are bottlenecks in most cases. – Michal Bieda Jun 27 '16 at 16:09
  • 1
    Please [stop using `mysql_*` functions](http://stackoverflow.com/questions/12859942/why-shouldnt-i-use-mysql-functions-in-php). [These extensions](http://php.net/manual/en/migration70.removed-exts-sapis.php) have been removed in PHP 7. Learn about [prepared](http://en.wikipedia.org/wiki/Prepared_statement) statements for [PDO](http://php.net/manual/en/pdo.prepared-statements.php) and [MySQLi](http://php.net/manual/en/mysqli.quickstart.prepared-statements.php) and consider using PDO, [it's really pretty easy](http://jayblanchard.net/demystifying_php_pdo.html). – Jay Blanchard Jun 27 '16 at 16:14
  • Writing to 271 files at random might also take its toll on a spinning disk. Did you test your random write speed? – Johan Jun 27 '16 at 16:17

1 Answers1

-1

I have no ability to debug your code and your database.

But I would start from changing this deprecated mysql_* code:

$q=mysql_query(bq($query)) or print mysql_error();
while ($r=mysql_fetch_assoc($q)){

to this PDO version like:

$dsn = 'mysql:dbname=testdb;host=127.0.0.1';
$user = 'dbuser';
$password = 'dbpass';

try {
    $dbh = new PDO($dsn, $user, $password);
} catch (PDOException $e) {
    echo 'Connection failed: ' . $e->getMessage();
}

$sth = $dbh->prepare($q);
$sth->execute();

$result = $sth->fetchAll(PDO::FETCH_ASSOC); // <-- this is important to get all records from mysql but not one by one
foreach($result  as $r) {
Alex
  • 16,739
  • 1
  • 28
  • 51
  • I'm in the process of doing just that, but that's a whole other thing, the MySQL connection and query is not what is taking time here. – Sandman Jun 27 '16 at 16:11
  • Actually getting elements one by one, with immediate processing and forgetting result arrays, more memory efficient. – Arnial Jun 27 '16 at 16:12
  • Pumping 435,453 records into memory seems like a bad idea. Perhaps change your example into using a simple `while($row = $sth->fetch(PDO::FETCH_ASSOC)) { ... }` – Pieter Jun 27 '16 at 16:12
  • I don't think fetching 435,453 rows of data into memory is the way to go here.. And i doubt changing to PDO will give a big performance boost. Would be good to update to PDO though. – Johan Jun 27 '16 at 16:12