0

Say I have db content with the word "supercomputer". I want to search using the following:

select * from content where content REGEXP '[[:<:]]super*[[:>:]]';

I want the user doing the search to be able to add wildcards like ? and *. I want whole word searches unless the user does as I explained above so using [ REGEX 'super' ] isn't an option.

I'd think this would work, but I guess I'm still to new to using expressions.

kenyu73
  • 671
  • 6
  • 16
  • Wouldn't it make more sense to use fulltext searching? – Ja͢ck Apr 12 '13 at 03:07
  • @Jack If i have large amounts of content, and search for 'super', I don't want 'superstar' or 'supermart', only rows with 'super strength'... unless the user searches for 'super*'. – kenyu73 Apr 12 '13 at 03:12
  • 1
    fulltext search would give higher scores for full words over partial words. – Ja͢ck Apr 12 '13 at 03:13
  • @Jack oh, wait... so enable fulltext in MySQL and use that as an index? THis is my first "real" php mySql web app, so maybe my seach thoughts on searching the data directly is wrong. – kenyu73 Apr 12 '13 at 03:15
  • @Jack I enabled fulltext on my content column and reviewed the SQL syntax to use it. Thanks for that tip. However, even using MATCH AGAINST('super*') or ('super%') doesn't return supercomputer. I'm switching to fulltext, but still need partial search ability. – kenyu73 Apr 12 '13 at 03:45
  • eh, not that simple it appears - http://stackoverflow.com/questions/295266/full-text-catalog-index-search-for-book – kenyu73 Apr 12 '13 at 03:50
  • I thought you're working on MySQL, not SQL Server? – Ja͢ck Apr 12 '13 at 03:53
  • @Jack No, my system is MySQL. I was just thinking that the same thing applied to here in some fashion. Sorry. – kenyu73 Apr 12 '13 at 03:59

2 Answers2

2

The wildcard character * means something different in regex (sort of). It denotes that a sequence may be repeated zero or more times, not that there is some text of zero or more length that it can be replaced by.

You're going to have to do some preprocessing if you want the user to be able to use wildcards like this, but you can simply replace any * with the expression: \w*. This says that you are expecting zero or more word characters, not spaces or punctuation. so the complete expression would look like:

select * from content where content REGEXP '[[:<:]]super.*[[:>:]]';

This says you want any sequence that begins with 'super' and contains only word characters and is surrounded by word boundaries.

Godwin
  • 9,739
  • 6
  • 40
  • 58
  • `select * from content where content REGEXP '[[:<:]]super.+[[:>:]]';` seems to work for both pre and post fixing. – kenyu73 Apr 12 '13 at 13:15
  • 1
    Make sure you know what both `.` and `+` mean, this expression will match `super computer` since `.` will match spaces (only really important if you want to use the match) but it will also not match the super in `my expression is super` because `+` wants one or more characters and there is not character following super in this case. I used `*` because it will match zero or more characters. – Godwin Apr 12 '13 at 18:41
  • hmm, I thought I tried `.*` and it didn't always work. For the most part, using fulltext solved most of my issues, but I still have a small need for user defined searches. Thanks for the continued feedback! If you make a full comment with your descriptions, I'll mark this questions as solved. – kenyu73 Apr 12 '13 at 23:21
  • You were correct. `supercomputer.+` was not found whereas `super.*` and `supercomputer.*` both matched. – kenyu73 Apr 12 '13 at 23:25
  • sorry, I meant update your answer. `\w` wasn't the solution, but `.*` actually is. `.*` appears to be the same as and old school `*` wildcard. – kenyu73 Apr 12 '13 at 23:52
  • @kenyu73, done. However, it worked for me so it's likely some difference in the SQL Regex engine. As long as you're not using the result, it should be the same, however if you actually use the result you might get matches that start with the word and continue on until the end of the string. If you do come across that, you can construct `\w` yourself, something like `[a-zA-Z]` should work. – Godwin Apr 12 '13 at 23:59
0

This ended up being my final PHP / MySQL solution for my search.

$bCustom = (strpos($search, "*") !== false && strpos($search, "?") !== false) ? false : true;

$sql = "SELECT content.*, users.user, users.lastname, users.firstname FROM content INNER JOIN users ON content.created_by=users.id ";

$search = (trim($get['SEARCH']) === "") ? "*" : trim($get['SEARCH']);

if ($bCustom) {
    $search = str_replace("*", ".*", $search);
    $search = str_replace("?", ".", $search);
    $sql .= "WHERE content.name REGEXP '[[:<:]]" . $search . "[[:>:]]' OR content.content REGEXP '[[:<:]]"
            . $search . "[[:>:]]' ORDER BY content.name ASC; ";
} else {
    if ($search !== "") {
        $sql .= "WHERE MATCH (name, content) AGAINST ('" . $search . "')";
    } else {
        $sql .= "ORDER BY content.name ASC; ";
    }
}
kenyu73
  • 671
  • 6
  • 16