0

I have a big rhyme database with 360000 words (entries). Every word has a category (for example: 'sheet' and 'meet' have the category 'eet'). A query to find suitable rhymes is somekind of slow on my webspace, so I want to speed it up by encrypting the categories into hashes which have only numbers. (I heard that is faster, is that right? :)

Which hash algo should I use to encypt single word strings? It should only contain numbers.

Or do you have other suggestions to speed up the database queries?

Thanks!

Crayl
  • 1,883
  • 7
  • 27
  • 43
  • This isn't clear. You want to hash your strings, but then what? – Oliver Charlesworth May 29 '12 at 17:29
  • 2
    If you just create an index on your column, MySQL will do this for you. MySQL also supports [fulltext search](http://dev.mysql.com/doc/en/fulltext-search.html), which may be of interest to you. – eggyal May 29 '12 at 17:29
  • 1
    You shouldn't have to do this. An index on the involved columns (like "category") would be even faster than your hash/hack. – Denys Séguret May 29 '12 at 17:29
  • 3
    Before you go hashing everything up why not post the query that is slow here and also a little more detail like data structures. You could also paste the results of EXPLAIN YOUR QUERY HERE on the site so we can see why the query is running so slow. You might just be missing a index. 360000 words is tiny. – Namphibian May 29 '12 at 17:30

1 Answers1

2

Perhaps you could implement the levenshtein algorithm into mysql as a stored function, below is an example, hope it helps:

DELIMITER //
CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )
  RETURNS INT
  DETERMINISTIC
  BEGIN
    DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
    DECLARE s1_char CHAR;
    -- max strlen=255
    DECLARE cv0, cv1 VARBINARY(256);
    SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
    IF s1 = s2 THEN
      RETURN 0;
    ELSEIF s1_len = 0 THEN
      RETURN s2_len;
    ELSEIF s2_len = 0 THEN
      RETURN s1_len;
    ELSE
      WHILE j <= s2_len DO
        SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
      END WHILE;
      WHILE i <= s1_len DO
        SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
        WHILE j <= s2_len DO
          SET c = c + 1;
          IF s1_char = SUBSTRING(s2, j, 1) THEN 
            SET cost = 0; ELSE SET cost = 1;
          END IF;
          SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
          IF c > c_temp THEN SET c = c_temp; END IF;
            SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
            IF c > c_temp THEN 
              SET c = c_temp; 
            END IF;
            SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
        END WHILE;
        SET cv1 = cv0, i = i + 1;
      END WHILE;
    END IF;
    RETURN c;
  END; 

Source http://www.artfulsoftware.com/infotree/queries.php#552 (Fixed by adding DELIMITER //)

Test example script

<?php
//I inserted these test words into a test database
$words = 'back, lack, pack, rack, sack, tack, yak, black, knack, quack, slack, smack, snack, stack, track, whack, attack,bale, fail, hail, mail, male, nail, pail, tale, rail, sail, stale, scale, snail, whale, detail, email,air, bare, care, chair, dare, fair, hair, pair, rare, wear, chair, flare, stare, scare, share, spare, square, there, where, aware, beware, compare, declare, despair, prepare, repair, unfair,ache, bake, fake, lake, make, rake, take, brake, break, flake, quake, snake, steak, awake, mistake,all, ball, call, doll, hall, fall, tall, crawl, small, baseball, football,an, can, fan, man, pan, ran, tan, van, plan, scan, span, began,and, band, hand, land, sand, bland, command, demand, expand, stand, understand,cap, gap, map, nap, tap, zap, chap, clap, flap, slap, snap, strap, trap, wrap,are, bar, car, far, jar, tar, star, scar, afar, guitar,at, bat, fat, mat, pat, rat, sat, flat, that, splat, combat,ate, date, fate, mate, late, gate, rate, wait, crate, great, plate, skate, slate, state, straight, trait, weight, create,bed, dead, fed, head, led, read, red, said, bread, fled, spread, thread, tread, instead,bell, fell, sell, well, yell, shell, smell, spell, farewell, hotel, motel,den, hen, men, pen, ten, glen, then, when, wren, again,bet, get, jet, let, met, pet, set, vet, wet, yet, threat, barrette, reset, upset,bin, chin, in, pin, tin, grin, thin, twin, skin, begin, within,king, ring, sing, wing, zing, bring, cling, fling, sling, spring, sting, string, swing, thing,bit, fit, hit, it, kit, lit, pit, sit, flit, knit, quit, skit, slit, spit, split, admit, commit, permit,bite, kite, bright, fight, fright, knight, night, might, right, tight, white, write, delight, tonight,go, hoe, low, mow, row, sew, toe, blow, crow, dough, flow, know, glow, grow, know, show, slow, snow, stow, though, throw, ago, although, below,cot, dot, got, hot, lot, not, pot, rot, tot, bought, fought, knot, taught, shot, spot, squat, forgot,crowned, found, ground, hound, mound, pound, round, sound, wound, around, surround,bows, hose, nose, rose, toes, blows, flows, froze, grows, those,cub, rub, sub, tub, club, stub, scrub, shrub ,bun, fun, gun, one, run, son, sun, ton, won, done, none, begun, outdone, undone';

//A Class for db connection
Class DB{
    private $db;

    function __construct($host,$dbname,$user,$pass){
        $this->dbhost = $host;
        $this->dbname = $dbname;
        $this->dbuser = $user;
        $this->dbpass = $pass;
    }

    private function connect(){
        if (!$this->db instanceof PDO){
            $this->db = new PDO('mysql:dbname='.$this->dbname.';host='.$this->dbhost, $this->dbuser, $this->dbpass);
            $this->db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
        }
    }

    //A Model method for the levenshtein_query.
    public function levenshtein_query($word,$dist){
        $this->connect();
        $sql = "SELECT `word` FROM `words` WHERE levenshtein( :word ,`word` ) BETWEEN 0 AND $dist";
        $statement = $this->db->prepare($sql);
        $statement->bindParam(':word', $word, PDO::PARAM_STR);
        $statement->execute();
        return $statement->fetchAll(PDO::FETCH_ASSOC);
    }
}

//ini the model class
$model = new DB('localhost','test_db','root','');

//The Word posted
$word = 'eet';
$result = $model->levenshtein_query($word,1);

print_r($result);
/*
//The Result
Array
(
    [0] => Array
        (
            [word] => bet
        )

    [1] => Array
        (
            [word] => get
        )

    [2] => Array
        (
            [word] => jet
        )

    [3] => Array
        (
            [word] => let
        )

    [4] => Array
        (
            [word] => met
        )

    [5] => Array
        (
            [word] => pet
        )

    [6] => Array
        (
            [word] => set
        )

    [7] => Array
        (
            [word] => vet
        )

    [8] => Array
        (
            [word] => wet
        )

    [9] => Array
        (
            [word] => yet
        )

    [10] => Array
        (
            [word] => meet
        )

)

*/

Perhaps its of some interest...

Lawrence Cherone
  • 46,049
  • 7
  • 62
  • 106