How does Firefox's 'awesome' bar match strings?

Question

The question is how the string matching is done to find matching entries by the firefox 3 url bar. Substring matching on every entry might be slow. What algorithm can be used to match at any location in a fast way?

score 31 · Answer 1 · edited May 07 '21 at 07:24

The algorithm the location bar in Firefox 3.0 is bit complicated. It will get data from two (three for Firefox 3.5 and later) different queries:

For the first query, it checks the moz_inputhistory table to see if the current search string is stored in that table. These results are sorted by "rank" which is a number that determines how recently it is used. This number degrades once a day. This search is what makes the location bar adaptive to what you select over time (implemented in bug 95739).
In Firefox 3.5 and later, it then checks the moz_keyword table for bookmarks with a keyword that match the search text exactly.
The last query, it goes through every entry in moz_places, which will include all history visits and bookmarks. These results are ordered by frecency.

For all three of these cases, the following algorithm is used for matching against the tags, title, and the url (called "searchable text" below). This is a bit difficult to explain in words, so it might be easier to look at the source code.

The search string is broken into tokens determined by whitespace (each non-whitespace "word" is a token).
For each token, start comparing each character of the searchable text and the token in a Unicode, case-insensitive way until it matches completely. If one set of characters do not match, advance to the next "word boundary" in the searchable text and try again.
If we match the any one of the searchable text, it will show up in the location bar.
If we do not have enough results (default is 12), we will then redo the search of the query going through every bookmark and history visit, and test the searchable text in a Unicode, case-insensitive way for each token anywhere (not just at word boundaries).

Hopefully that explains it in an understandable way!

Looks like the link to the source code is broken. – Matt Huggins Jun 23 '21 at 23:01 — Matt Huggins, Jun 23 '21 at 23:01

score 30 · Accepted Answer · answered Feb 12 '09 at 17:39

30

The normal way to do fast substring matching is to create a data structure which contain all the suffixes of all the strings you want search. Depending on the organization, this can be called the "suffix tree" or "suffix array". For example, if you have 1000 strings and every one is 50 characters long, you have 1.000 x 50 nontrivial suffixes, i.e. your suffix array would have 50.000 entries.

Then to do the matching, you do binary search (if array) or tree search (if tree) to find all those suffixes in the data structure whose beginning matches the string written in the search box. Because it is the beginning of the suffix you are matching, you can use standard search procedures (binary search, tree descent) to get to the result fast. Every suffix is linked to those strings in which it appears.

Example: you have two strings "CAT" and "DOT". Your suffix array looks like this (note lexiographic = alphabetic ordering):

#1 AT  --> CAT
#2 CAT --> CAT
#3 DOT --> DOT
#4 OT  --> DOT
#5 T   --> DOT, CAT

Note that there are six suffixes but two of the are the same (last "T" in "CAT" and "DOT") and the are both represented by the same entry (#5).

Now when the user types in search, e.g. "OT" (which should match "DOT"), you can do simple lexiographic ordering lookup in log-time as you are now searching for a beginning match in the suffix array.

This is the basic principle of fast text searching when the search pattern does not contain wildcards.

answered Feb 12 '09 at 17:39

Antti Huima

25,136
3
52
71

This is only necessary if you want to find matches in the middle of a word. If you're happy with matching from the beginning of a word, an ordinary inverted index is sufficient. – Nick Johnson Jul 29 '09 at 22:10
2

Based on my own tests, 'firefo' turns up a result very quickly (and incrementally), while 'irefo' takes a while. I'm guessing it uses an inverted index intially, and performs a brute-force scan of the whole db if nothing is returned. – Nick Johnson Jul 29 '09 at 22:11
9

While this is a nice explanation, it's not at all how Firefox does this. The memory usage and disk usage would be quite large for most users. Assuming 1000 history entries (very conservative), and a typical url being 100-1000 characters long, 1,000,000 - 10,000,000 array entries. With those numbers, you are looking at a minimum of 50MB for the table. – sdwilsh Jul 30 '09 at 18:37
50MB isn't large unless you're stuck in the '90s. And 10 million URLs each over 100 characters long is a very optimistic limit - visiting 10 million seconds at 1 URL a second would take 115 days. Besides, what do you propose Firefox does, if not storing them somewhere? – Nick Johnson Jul 31 '09 at 08:25
Yeah, Firefox stores the browsing history anyway, an the awesomebar is linked to the browsing history. And the browsing history needs to be stored also, even if there is no awesomebar! – Antti Huima Aug 03 '09 at 15:35
5

50MB is a conservative estimate, and may not seem like much. However, when you application only [uses around 100-200MB normally][1], you are looking at a 50-25% increase in memory usage. People already complain that Firefox uses too much memory. All this still doesn't change the fact that this isn't how Firefox does its lookup. [1]: http://blog.pavlov.net/2008/03/11/firefox-3-memory-usage/ – sdwilsh Aug 29 '09 at 17:17

score 23 · Answer 3 · edited May 08 '21 at 05:34

23

The awesomebar suggests URLS by an algorithm called The Places frecency algorithm.

According the Mozilla developer website:

The word "frecency" itself is a combination of the words "frequency" and "recency."

edited May 08 '21 at 05:34

orzechow

1,210
2
13
18

answered Feb 12 '09 at 10:38

RuudKok

5,252
2
26
27

2

That is just how the data is sorted and has very little to do with what results are displayed. – sdwilsh Jul 30 '09 at 18:39
True, but the question has been changed since it was originally posted. So my answer had to do with the original question and might look a bit off... – RuudKok Jul 31 '09 at 06:38

score 2 · Answer 4 · answered Feb 12 '09 at 10:31

2

I think this is left to the underlying storage: the SQLite Database where Firefox stores the Pages you visited offers a fast function for substring comparison.

I think Firefox issues a SQL Query to the Database. This is quite fast because the database is cached in your memory.

answered Feb 12 '09 at 10:31

theomega

31,591
21
89
127

2

That's not right at all. The data is in fact obtained from SQLite, but the matching is done with a completely different algorithm. – sdwilsh Jul 30 '09 at 18:41

How does Firefox's 'awesome' bar match strings?

4 Answers4

Linked