1

On my CakePHP 1.3 application there is a product page. On this page it shows the current product plus 2 related products.

The related products are found based on this function on the Product Model:

public function related($id, $limit = 2)
{
    $item = $this->find('first',array(
        'fields'=>array('Product.style_number','Product.brand'),
        'conditions'=>array('Product.id'=>$id),
        'recursive'=>0,
    ));

    $data = $this->find('all',array(
        'fields'=>array('Product.id','Product.name','Product.image','Product.url','levenshtein(Product.style_number,"'.$item['Product']['style_number'].'") as dist'),
        'limit'=>$limit,
        'conditions'=>array('NOT'=>array('Product.id'=>$id),'Product.brand'=>$item['Product']['brand']),
        'order'=>'dist',
    ));

    return $data;
}

This function finds the products with the closest style number based on the levenshtein distance. In the query, levenshtein() is a user defined MySQL function you can view source here

When I test this on a table with about 100 rows its fairly fast. However my Product table currently has 10K rows and growing.

I tried adding 'Product.brand'=>$item['Product']['brand'] to limit how many rows it operates on and I also made Product.style_number an index in hopes to speed it up.

Its still pretty slow, it causes about a 2-3 seconds delay when loading the page.

What can I do to make this fast? Is there a way I can cache it..if so how?

Is there a different way I can get the same data faster?

What are my options?

The results I am getting are pretty accurate though, its finding the closest related products.

JD Isaacks
  • 56,088
  • 93
  • 276
  • 422
  • Have you tried it without the distance function? – Matt Mills Jul 19 '11 at 18:45
  • Without the distance function the page loads fast but the related products are inaccurate. I am starting to think maybe I should load the page without it first, then use ajax to load the related products since they aren't the primary part of the page anyways. – JD Isaacks Jul 19 '11 at 18:49
  • 2
    levenshtein calculations are very expensive, in the range of O(n*m). if you're doing it on multiple rows, the costs add up very quickly: http://stackoverflow.com/questions/4057513/levenshtein-distance-algorithm-better-than-onm – Marc B Jul 19 '11 at 18:49
  • That was going to be my recommendation. – Matt Mills Jul 19 '11 at 18:50

1 Answers1

2

It sounds like this would be a good application for an AJAX workflow. If the distance calculation is taking most of the time (and especially if it's not a primary part of the page) then you should be able to realize substantial speed-up by running that query separately.

Matt Mills
  • 8,692
  • 6
  • 40
  • 64