6

I'm returning mobile phone data based on the user agent. But in an instance where the useragent is not stored (newer version of phone or software) I want to be able to return the closest match, a bit like how Google displays the "did you mean this". i.e.

if I have a stored useragent of

Mozilla/5.0 (Linux; U; Android 2.1-update1; en-nl; Desire_A8181 Build/ERE27) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17

and the user agent in use is

Mozilla/5.0 (Linux; U; Android 2.1-update1; en-nl; Desire_G45H Build/ERE27) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17

I want to be able to return the one stored to add or adapt my data accordingly.

Does anyone know a way of achieving this?

Phil Jackson
  • 10,238
  • 23
  • 96
  • 130
  • If you want to develop your own system that fine by me, but.... Did you take a look at http://docs.php.net/function.get_browser and the usually faster-than-light updated browscap.ini at http://browsers.garykeith.com/downloads.asp ? – VolkerK Jul 08 '11 at 09:21
  • If you only want to take care of this on database level, you should remove the php tag. – user247702 Jul 08 '11 at 09:58

5 Answers5

4

use full text searching with most relevant data...

SELECT * MATCH(browser) AGAINST ('your browser') AS score order by score DESC
Avinash
  • 6,064
  • 15
  • 62
  • 95
  • interesting, would have to use a limit though right as it would still return all results just in order? Will defiantly look int this. – Phil Jackson Jul 08 '11 at 09:31
  • Be aware, however, that the MySQL fulltext index is a beast with some problems - for example, minimum word length is 4 chars: in "530.17", it would automatically always ignore the ".17" part. Also, the closest match criteria is a little flawed. It works alright for natural texts, but maybe not for user agents. – Steffen Jul 08 '11 at 09:48
  • you can change the minimum word length and disable stopwords, which are an even bigger pain. see http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html – user428517 Feb 21 '13 at 19:03
2

The usual approach for fuzzy string matching are things like calculating the levenshtein distance or implementing it as an n-gram search index. But for matching user agents, this is overkill.

Rather reduce the string you search for to certain important criterias, then do something like

SELECT * FROM agents WHERE agent LIKE "Mozilla/5.0 (Linux; U; Android%) AppleWebKit/5% Version/4.0 Mobile Safari/5%"

So, you strip out certain too detailed parts and replace them by % in your LIKE statement. You should, however, reconsider the architecture - I would only save the important parts and leave out the exact build number etc. Also consider using an external library that already contains user agents and does the matching for you, no need to reinvent the wheel.

EDIT: just as VolkerK pointed out above, the "external library" should be PHPs getbrowser. Just added for compeleteness of the answer ;-)

Steffen
  • 2,235
  • 13
  • 19
  • I have a database of 6,549 mobile devices all with multiple user agents against them, I'm not re-inventing the wheel, just smoothing off the edges. Thanks for the above method, will give it a try. Regards – Phil Jackson Jul 08 '11 at 09:27
0

Take a look at one of the scripts on Highlight the difference between two strings in PHP, with some modifications you should be able to get a percentage of the difference.

Community
  • 1
  • 1
user247702
  • 23,641
  • 15
  • 110
  • 157
  • That's two strings, I could write my own function for that or use levenshtein() but this is a matter of sql efficiency and was wondering if there was a sql equivalent – Phil Jackson Jul 08 '11 at 09:29
0

Use full-text search; otherwise these functions may help:

seriousdev
  • 7,519
  • 8
  • 45
  • 52
0

To get the best result, you can extract the whole records and do a cycle to guess who's the most similar word

check these functions

SIMILAR_TEXT

SOUNDEX

LEVENSHTEIN

I.E.

$most_similar = "";
$highestPercentage = 0;
foreach ($ua in $all_user_agents) {
    $i = similar_text($current_user_agent, $ua, &$p);
    if ($p > $highestPercentage) {
        $highestPercentage = $p;
        $most_similar = $ua;
    }
}

echo "most similar = $most_similar"

Anyway, you can use the mySQL Fulltext-search following some guidelines like this one

VAShhh
  • 3,494
  • 2
  • 24
  • 37