2

In trying to rejuvinate code I wrote mostly 14+ years ago. I've come to see that the lovely little setup I wrote then was... lacking in certain places, namely handling user inputs.

Lesson: Never underestimate users ability to inject trash, typos, and dupes past your validators.

The old way is reaching critical mass as there are 470 items in a SELECT dropdown now. I want to reinvent this part of the process so I don't have to worry about it hitting a breaking point.

So the idea is to build a fuzzy search method so that after the typist enters the search string, we check against five pieces of data, all of which reside in the same row.

I need to check the name submitted against the Stage Name, two also-known-as names, as well as their legal name and as a final check against a soundex() index based on their Stage Name (this catches a few spelling errors missed otherwise)

I've tried a complicated block of code to check these things (and it doesn't work, mostly because I think I coded the comparisons too strict) as part of a do/while loop.

In the below, var $Rin would contain the user supplied name.

$setr = mysql_query("SELECT ID,StageName,AKA1,AKA2,LegalName,SoundEx FROM performers");
IF ($R = mysql_fetch_array($setr)) {
    do {
        $RT = substr(trim($Rin), 5);
        $RT1 = substr($R[1], 5);
        $RT2 = substr($R[2], 5);
        $RT3 = substr($R[3], 5);
        $RT4 = substr($R[4], 5);
        $RTx = soundex($RT);
        IF ($RT == $RT1) {
            $RHits[] = $R[0];
        }
        IF ($RT == $RT2) {
            $RHits[] = $R[0];
        }
        IF ($RT == $RT3) {
            $RHits[] = $R[0];
        }
        IF ($RT == $RT4) {
            $RHits[] = $R[0];
        }
        IF ($RTx == $R[5]) {
            $RHits[] = $R[0];
        }
    } while ($R = mysql_fetch_array($setr));
}

The idea being that I'll build an array of the ID#'s of the near hits, which I'll populate into a select dropdown that has only hopefully fewer hits that the whole table. Which means querying for a result set from the contents of that array, in order to display the Performer's name in the SELECT dropdown and pass the ID# as the value for those choices.

Thats when I hit the 'I need to use an array in my WHERE clause' problem, and after finding that answer, I am starting to suspect I'm out of luck due to Stipulation #2 below. So I started looking at alternate search methods and I'm not sure I've gotten anywhere but more confused.

So, is there a better way to scan a single table for six fields, checking five against user input and noting the sixth for display in a subset of the original table?

Thought process:

Against the whole table, per record, test $Rin against these tests in this order:

$Rin -> StageName
$Rin -> AKA1
$Rin -> AKA2
$Rin -> LegalName
soundex($Rin) -> SoundEx

where a hit on any of the five operations adds the ID# to a result array that is used to narrow the results from 470 performers down to a reasonable list to choose from.

Stipulations:

1) As written, I know this is vulnerable to an SQL injection attack.

2) Server runs PHP 4.4.9 and MySQL 4.0.27-Standard, I can't upgrade it. I've got to prove it works before money will be spent.

3) This is hobby-level stuff, not my day job.

4) Performers often use non-English names or elements in their names, and this has led to typos and duplication by the data entry typists.

I've found a lot of mysqli and PDO answers for this sort of thing, and I'm seeing a lot of things that only half make sense (like link #4 below). I'm working on getting up to speed on these things as I try and fix whats become broken.

Places already looked:

  1. PHP mysql using an array in WHERE clause
  2. PHP/MySQL small-scale fuzzy search
  3. MySQL SubString Fuzzy Search
  4. Sophisticated Name Lookup
Community
  • 1
  • 1
Sandor Dosa
  • 133
  • 1
  • 9
  • First things first, "mysql_*" are now deperecated. Use "mysqli_*" functions instead. :) – Rav Jan 17 '17 at 01:40
  • Is mysqli available on PHP v4.4.9? – Sandor Dosa Jan 17 '17 at 01:41
  • 1
    With several hundred options and a type-ahead, I would suggest loading data via service call (AJAX) in the background after page load, and handling typeahead with a Javascript tool, e.g. [Twitter's Typeahead library](https://twitter.github.io/typeahead.js/examples/), *especially* if you're searching against multiple columns. – alttag Jan 17 '17 at 01:54
  • @SandorDosa, you're not considering an upgrade? – Rav Jan 17 '17 at 02:00
  • @Ronald Short Answer: yes. Long Answer is more complicated and involves time pressure to get this system at least looking like it's running while I teach myself how to MySQL again to the new standard. Need to have working site by 10 March this year. Upgrades can come later. – Sandor Dosa Jan 17 '17 at 02:26
  • @alttag this is an interesting tangent. Would the usability of some sort of type ahead like you suggest be viable on a server running such badly out of data software as I am currently languishing in? – Sandor Dosa Jan 17 '17 at 02:27
  • @alttag solution would outsource the overhead of the searching to the client, thus would actually reduce the overhead for the server. – Shadow Jan 17 '17 at 06:51
  • @SandorDosa, yes, it would work remarkable well on the stack you described. One server/db call is all; no fancy SQL or parsing inputs. Let the client handle it. With "only" ~500 rows, I think it'd be pretty straightforward. I know it's a different architecture than asked about, but it's robust, well-tested, and open source. One caveat is you'll have to craft your JSON manually, as `json_encode()` isn't available until PHP 5.0. I'll try to put together a longer answer for you later today. – alttag Jan 17 '17 at 15:30

1 Answers1

1

I mentioned in the comments that a Javascript typeahead library might be a good choice for you. I've found Twitter's Typeahead library and Bloodhound engine to be pretty robust. Unfortunately, the documentation is a mixed bag: so long as what you need is very similar to their examples, you're golden, but certain details (explanations of the tokenizers, for example) are missing.

In one of the several questions re Typeahead here on Stack Overflow, @JensAKoch says:

To be honest, I think twitter gave up on typeahead.js. We look at 13000 stars, a full bugtracker with no maintainer and a broken software, last release 2015. I think that speaks for itself, or not? ... So, try one of the forks: github.com/corejavascript/typeahead.js

Frankly, in a brief check, the documentation at the fork looks a bit better, if nothing else. You may wish to check it out.

Server-side code:

All of the caveats of using an old version of PHP apply. I highly recommend retooling to use PDO with PHP 5, but this example uses PHP 4 as requested.

Completely untested PHP code. json_encode() would be better, but it doesn't appear until PHP 5. Your endpoint would be something like:

headers("Content-Type: application/json");
$results = mysql_query(
   "SELECT ID,StageName,AKA1,AKA2,LegalName,SoundEx FROM performers"
);

$fields = array("ID","StageName","AKA1","AKA2","LegalName","SoundEx");

echo "[";

$first =  true;
while ($row = mysql_fetch_array($results)) {
    ($first) ? $first = false : echo ',';

    echo "\n\t,{";
    foreach($fields as $f) {
        echo "\n\t\t\"{$f}\": \"".$row[$f]."\"";
    }
    echo "\n\t}";
}
echo "]";

Client-side code:

This example uses a static JSON file as a stub for all of the results. If you anticipate your result set going over 1,000 entries, you should look into the remote option of Bloodhound. This would require you to write some custom PHP code to handle the query, but it would look largely similar to the end point that dumps all (or at least your most common) data.

var actors = new Bloodhound({
  // Each row is an object, not a single string, so we have to modify the
  // default datum tokenizer. Pass in the list of object fields to be
  // searchable.
  datumTokenizer: Bloodhound.tokenizers.obj.nonword(
    'StageName','AKA1','AKA2','LegalName','SoundEx'
  ),
  queryTokenizer: Bloodhound.tokenizers.whitespace,
  // URL points to a json file that contains an array of actor JSON objects
  // Visit the link to see details 
  prefetch: 'https://gist.githubusercontent.com/tag/81e4450de8eca805f436b72e6d7d1274/raw/792b3376f63f89d86e10e78d387109f0ad7903fd/dummy_actors.json'
});

// passing in `null` for the `options` arguments will result in the default
// options being used
$('#prefetch .typeahead').typeahead(
    {
        highlight: true
    },
   {
        name: 'actors',
        source: actors,
        templates: {
        empty: "<div class=\"empty-message\">No matches found.</div>",
        
        // This is simply a function that accepts an object.
        // You may wish to consider Handlebars instead.
        suggestion: function(obj) {
            return '<div class="actorItem">'
                + '<span class="itemStageName">'+obj.StageName+"</span>"
                + ', <em>legally</em> <span class="itemLegalName">'+obj.LegalName+"</span>"
        }
        //suggestion: Handlebars.compile('<div><strong>{{value}}</strong> – {{year}}</div>')
      },
      display: "LegalName" // name of object key to display when selected
      // Instead of display, you can use the 'displayKey' option too:
      // displayKey: function(actor) {
      //     return actor.LegalName;
      // }
});
/* These class names can me specified in the Typeahead options hash. I use the defaults here. */
    .tt-suggestion {
        border: 1px dotted gray;
        padding: 4px;
        min-width: 100px;
    }
    .tt-cursor {
        background-color: rgb(255,253,189);
    }
    
    /* These classes are used in the suggestion template */
    .itemStageName {
        font-size: 110%;
    }
    .itemLegalName {
        font-size: 110%;
        color: rgb(51,42,206);
    }
<script src="https://code.jquery.com/jquery-3.1.1.min.js"></script>
<script src="https://twitter.github.io/typeahead.js/releases/latest/typeahead.bundle.js"></script>


<p>Type something here. A good search term might be 'C'.</p>
<div id="prefetch">
  <input class="typeahead" type="text" placeholder="Name">
</div>

For ease, here is the Gist of the client-side code.

Community
  • 1
  • 1
alttag
  • 1,163
  • 2
  • 11
  • 29
  • This looks like a viable solution. I'm not as versed in javascript as I am in PHP. I'll certainly give this a thorough look. Thank you. – Sandor Dosa Jan 17 '17 at 18:09