1

I've hacked together some php code that can probably be improved in terms of it's efficiency and loading speed. How can I improve it? Baring in mind by no means am I great at coding!!

At present the page takes about 5-10 seconds to load.

It's looping through hundreds of wordpress posts, placing them in a new mysql table and geocoding a lat/long. It's also checking if the post is already in the database.

Here's the code below:

<?php
require("database.php");
// Opens a connection to a MySQL server
$con = mysql_connect("localhost", $username, $password);

if (!$con)
{
    die('Could not connect: ' . mysql_error());
}

mysql_select_db("medicom_wp", $con);

$pages = get_posts(array(
    'orderby' => 'title', 
    'post_type' => 'members',
    'numberposts' => 300,
    'post_status' => 'any'  
    ));
foreach($pages as $post) {
    setup_postdata($post);

    $company = get_field('company_name');
    $address = get_field('address');
    $city = get_field('city');
    $post_code = get_field('post_code');

    $sql = sprintf("select count('x') as cnt from markers where `name` = '%s'", mysql_real_escape_string($company));
    $row_dup = mysql_fetch_assoc(mysql_query($sql,$con));
    if ($row_dup['cnt'] == 0) {
        mysql_query("INSERT INTO markers (`name`, `address`, `lat`, `lng`, `type`) VALUES ('".$company."', '".$address.", ".$city.", ".$post_code."', '0.0', '0.0', '')");
    }
}
wp_reset_query();


define("MAPS_HOST", "maps.google.com");
define("KEY", "");

// Opens a connection to a MySQL server
$connection = mysql_connect("localhost", $username, $password);
if (!$connection) {
  die("Not connected : " . mysql_error());
}

// Set the active MySQL database
$db_selected = mysql_select_db($database, $connection);
if (!$db_selected) {
  die("Can\'t use db : " . mysql_error());
}

// Select all the rows in the markers table
$query = "SELECT * FROM markers WHERE 1";
$result = mysql_query($query);
if (!$result) {
  die("Invalid query: " . mysql_error());
}

// Initialize delay in geocode speed
$delay = 0;
$base_url = "http://" . MAPS_HOST . "/maps/geo?output=xml" . "&key=" . KEY;

// Iterate through the rows, geocoding each address
while ($row = @mysql_fetch_assoc($result)) {
  $geocode_pending = true;

  while ($geocode_pending) {
    $address = $row["address"];
    $id = $row["id"];
    $request_url = $base_url . "&q=" . urlencode($address);
    $xml = simplexml_load_file($request_url) or die("url not loading");

    $status = $xml->Response->Status->code;
    if (strcmp($status, "200") == 0) {
      // Successful geocode
      $geocode_pending = false;
      $coordinates = $xml->Response->Placemark->Point->coordinates;
      $coordinatesSplit = split(",", $coordinates);
      // Format: Longitude, Latitude, Altitude
      $lat = $coordinatesSplit[1];
      $lng = $coordinatesSplit[0];

      $query = sprintf("UPDATE markers " .
             " SET lat = '%s', lng = '%s' " .
             " WHERE id = '%s' LIMIT 1;",
             mysql_real_escape_string($lat),
             mysql_real_escape_string($lng),
             mysql_real_escape_string($id));
      $update_result = mysql_query($query);
      if (!$update_result) {
        die("Invalid query: " . mysql_error());
      }
    } else if (strcmp($status, "620") == 0) {
      // sent geocodes too fast
      $delay += 1000;
    } else {
      // failure to geocode
      $geocode_pending = false;
      echo "Address " . $address . " failed to geocoded. ";
      echo "Received status " . $status . "
\n";
    }
    usleep($delay);
  }
}
?> 
Rob
  • 6,304
  • 24
  • 83
  • 189
  • And what parts do take so long? Have you checked the efficiency of your database queries? How long to they take? Maybe you can improve the speed here by creating additional indexes. – acme Jun 15 '12 at 14:35
  • You should use callbacks instead of using usleep to wait until things are done. When you iterate over "hundreds" of posts you are multiplying the inefficiency of usleep hundreds of times. – Alex W Jun 15 '12 at 14:35
  • 1
    This belongs on Code Review.SE: http://codereview.stackexchange.com/ – Jeroen Jun 15 '12 at 14:36
  • @acme I'm pretty sure it's the inserting into the mysql database and the delay on the geocode (although they may be nothing I can do about that!). – Rob Jun 15 '12 at 14:38

3 Answers3

3

You need to restructure your code to make use of the multi-insert SQL query. This should speed up performance considerably.

Instead of doing things like this, hundreds of times:

INSERT INTO table ( col1, col2) VALUES ( val1, val2)
INSERT INTO table ( col1, col2) VALUES ( val3, val4)
... 

You do this, once:

INSERT INTO table ( col1, col2) VALUES ( val1, val2), ( val3, val4)

Now, lets apply it to your code:

$values = array();

foreach($pages as $post) {
    setup_postdata($post);

    $company = get_field('company_name');
    $address = get_field('address');
    $city = get_field('city');
    $post_code = get_field('post_code');

    $values[] = "( '$company', '$address', '$city', '$post_code', 0.0, 0.0, '')";
}

$query = 'INSERT IGNORE INTO markers (`name`, `address`, `lat`, `lng`, `type`) 
              VALUES ' . implode( ', ', $values);

Now, instead of checking if the SQL row exists (with that $row_dup query), make sure you have a primary key on the name column, and do either INSERT IGNORE or INSERT ... ON DUPLIACATE KEY UPDATE.

You can learn more about these queries from this SO question.

Community
  • 1
  • 1
nickb
  • 59,313
  • 13
  • 108
  • 143
  • Ok sounds good, do you think it would speed things up by quite a bit? – Rob Jun 15 '12 at 14:39
  • I'm betting it would bring your execution time down *a lot*. Probably under a second, if you just focus on the query execution time – nickb Jun 15 '12 at 14:40
  • Profiling would indicate if this is an issue. Using InnoDB (or any transactional engine), wrapping the inserts in a transaction would yield similar results by reducing the amount of commits to disk. – Louis-Philippe Huberdeau Jun 15 '12 at 14:42
  • @nickb Thanks for the code, I gave it a go but it doesn't insert anything into the table. Any idea's why? – Rob Jun 15 '12 at 14:48
  • Getting any errors? Are the rows already inserted in the markers table? – nickb Jun 15 '12 at 15:50
1

Many insert statements will slow you down, if you are doing a heck of a lot you could parse your data into a csv file, then load using LOAD DATA INFILE, in situations with large data sets the performance difference can be massive,

Links

http://php.net/manual/en/function.fputcsv.php

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

@nickb has a suggestion which will certainly help

allen213
  • 2,267
  • 2
  • 15
  • 21
  • I agree with this solution over that provided below (multiple inserts simultaneously) because of the headaches you may encounter with the memory allotted to mysql. When in doubt and know the data sets are huge, csv -> mysql is pretty dependable. – EmmanuelG Jun 15 '12 at 14:46
  • @allen213 Thanks, would this gather up all the company's info, put it in a csv file then load that into the mysql table in one go? It has to somehow end up in the table at the end. – Rob Jun 15 '12 at 15:00
  • Yes it would you just described what would happen – allen213 Jun 15 '12 at 15:05
0

You can also speed this up by caching the data from you geocoding request to limit the number of calls to Google's geocode service on the next request. I usually save the time of the request as well as the accuracy or "location_type" of the request. I then check the accuracy of each marker and the time from the last request to see if I should update the information. The higher the accuracy the more time in between requests.

https://developers.google.com/maps/documentation/geocoding/

This is basic but you have several accuracy option

if($accuracy != 'ROOFTOP' || time() > ($update_time + 2592000))//check every 30 days
{
    //make a geocode request
}

ROOFTOP accuracy is as good as you're going to get with Google's geocoding. There is no need to keep updating unless Google changes something. This is why I set it to 30 days.

Nick
  • 1,157
  • 1
  • 8
  • 13