Sphinx vs. MySql - Search through list of friends (efficiency/speed)

Question

I'm porting my application searches over to Sphinx from MySQL and am having a hard time figuring this one out, or if it even needs to be ported at all (I really want to know if it's worth using sphinx for this specific case for efficiency/speed):

users
uid uname
  1    alex
  2    barry
  3    david

friends
uid | fid
  1     2
  2     1
  1     3
  3     1

Details are:
- InnoDB
- users: index on uid, index on uname
- friends: combined index on uid,fid

Normally, to search all of alex's friends with mysql:

$uid = 1
$searchstr = "%$friendSearch%";
$query = "SELECT f.fid, u.uname FROM friends f 
          JOIN users u ON f.fid=u.uid
          WHERE f.uid=:uid AND u.uname LIKE :friendSearch";
$friends = $dbh->prepare($query);
$friends->bindParam(':uid', $uid, PDO::PARAM_INT);
$friends->bindParam(':friendSearch', $searchstr, PDO::PARAM_STR);
$friends->execute();

Is it any more efficient to find alex's friends with sphinx vs mysql or would that be an overkill?
If sphinx would be faster for this as the list hits thousands of people, what would the indexing query look like? How would I delete a friendship that no longer exists with sphinx as well, can I have a detailed example in this case? Should I change this query to use Sphinx?

You know Sphinx is a search tech while MySQL is a storage database right?.... — Sammaye, Aug 20 '12 at 13:09
@Sammaye, so? MySQL offers search, and he's asking whether searching via MySQL is better than searching via Sphinx. Perfectly valid question. — Shlomi Noach, Aug 21 '12 at 05:13
@ShlomiNoach FTS via MySQL is not a valid method to search, trust me, try and you'll realise you will use double the resources and your queries will be dirt slow. What he should be doing is sorting the relations within MySQL and then sorting the searchable users in Sphinx and search per the uid. EWven though Sphinx has realtime indexes I wouldn't call them particulary stable or fast at high insertion rate. — Sammaye, Aug 21 '12 at 07:12
@ShlomiNoach Also if you check the results you receive from MySQL match compared to what you would receive from a tech like Sphinx or Solr you will realise the results MySQL gives are...weird. At the end of the day even though MySQL "offers" search that doesn't mean you should use it if your looking for performant and decent searching capabilities. MySQL has many things that shouldn't be used if your looking to not slow down your app, I consider it's FTS to be one — Sammaye, Aug 21 '12 at 07:32
@Sammaye, I have no argument with this. On the contrary: these two last comments of yours will make for an excellent answer, whereas your first comment leaves the guy in the fog. — Shlomi Noach, Aug 21 '12 at 08:06
@ShlomiNoach Yea I agree hopefully (hopefully :P) my answer might clear that up, though I can tend to go in circles when I write long answers cos I forget what I write at the beginning but I think my answer goes in one direction, hopefully. — Sammaye, Aug 21 '12 at 08:10
@Sammaye - you'll notice that the question does not contain a single word about full text indexes. Inspecting the query posted shows that. On the other hand, Sphinx is much, much better than MySQL's MyISAM + FTS and Sphinx can be used as MySQL storage engine giving the best of both worlds. — N.B., Aug 21 '12 at 08:18
On the other hand, in order to speed up this query and avoid table scan - remove the percentage sign from the start of the name. For a few thousand entries, this is nothing, if your InnoDB is properly configured (buffer pool etc). — N.B., Aug 21 '12 at 08:22
@N.B. Indeed the question does not but it uses a FTS tech, so I assumed FTS. And yes Sphinx does have SphinxSE with primative joining but it's querying language is still a little primative and as I said it's joins are too. Though you are right it could fullfill his needs, depends really. — Sammaye, Aug 21 '12 at 08:24

Sammaye · Accepted Answer · 2012-08-21T09:28:09.410

Ok this is how I see this working.

I have the exact same problem with MongoDB. MongoDB "offers" searching capabilities but just like MySQL you should never use them unless you wanna be choked with IO, CPU and memory problems and be forced to use a lot more servers to cope with your index than you normally would.

The whole idea if using Sphinx (or another search tech) is to lower cost per server by having a performant index searcher.

Sphinx however is not a storage engine. It is not as simple to query exact relationships across tables, they have remmedied this a little with SphinxQL but due to the nature of the full text index it still doesn't do an integral join like you would get in MySQL.

Instead I would store the relationships within MySQL but have an index of "users" within Sphinx.

In my website I personally have 2 indexes:

main (houses users,videos,channels and playlists)
help (help system search)

These are delta updated once every minute. Since realtime indexes are still bit experimental at times and I personally have seen problems with high insertion/deletion rates I keep to delta updates. So I would use a delta index to update the main searchable objects of my site since this is less resource intensive and more performant than realtime indexes (from my own tests).

Do note inorder to process deletions and what not your Sphinx collection through delta you will need a killlist and certain filters for your delta index. Here is an example from my index:

source main_delta : main
{
    sql_query_pre = SET NAMES utf8
    sql_query_pre =
    sql_query = \
        SELECT id, deleted,  _id, uid, listing, title, description, category, tags, author_name, duration, rating, views, type, adult, videos, UNIX_TIMESTAMP(date_uploaded) AS date_uploaded \
        FROM documents \
        WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) OR update_time >( SELECT last_index_time FROM sph_counter WHERE counter_id=1 )

    sql_query_killlist = SELECT id FROM documents WHERE update_time>=( SELECT last_index_time FROM sph_counter WHERE counter_id=1 ) OR deleted = 1
}

This processes deletions and additions once every minute which is pretty much realtime for a real web app.

So now we know how to store our indexes. I need to talk about the relationships. Sphinx (even though it has SphinxQL) won't do integral joins across data so I would personally recommend doing the relationship outside of Sphinx, not only that but as I said this relationship table will get high load so this is something that could impact the Sphinx index.

I would do a query to pick out all ids and using that set of ids use the "filter" method on the sphinx API to filter the main index down to specific document ids. Once this is done you can search in Sphinx as normal. This is the most performant method I have found to date of dealing with this.

The key thing to remember at all times is that Sphinx is a search tech while MySQL is a storage tech. Keep that in mind and you should be ok.

Edit

As @N.B said (which I overlooked in my answer) Sphinx does have SphinxSE. Although primative and still in sort of testing stage of its development (same as realtime indexes) it does provide an actual MyISAM/InnoDB type storage to Sphinx. This is awesome. However there are caveats (as with anything):

The language is primative
The joins are primative

However it can/could do the job your looking for so be sure to look into it.

score 6 · Answer 2 · answered Aug 17 '12 at 19:46

so I'm going to go ahead and kinda outline what -I- feel the best use cases for sphinx are and you can kinda decide if it's more or less in line for what you're looking to do.

If all you're looking to do is a string search one one field; then with MySQL you can do wild card searches without much trouble and honstly with an index on it unless you're expecting millions of rows you are going to be fine.

Now take facebook, that is not only indexing names, but pages ect or even any advanced search fields. Sphinx can take in x columns from MySQL, PostGRES, MongoDB, (insert your db you want here) and create a searchable full-text index across all of those.

Example:

You have 5 fields (house number, street, city, state, zipcode) and you want to do a full text search across all of those. Now with MySQL you could do searches on every single one, however with sphinx you can glob them all together then sphinx does some awesome statistical findings based on the string you've passed in and the matches which are resulting from it.

This Link: PHP Sphinx Searching does a great job at walking you through what it would look like and how things work together.

So you aren't really replacing a database; you're just adding a special daemon to it (sphinx) which allows you to create specialized indexes and run your full text searches against it.

score 5 · Answer 3 · answered Aug 19 '12 at 15:05

No index can help you with this query, since you're looking for the string as an infix, not a prefix (you're looking for '%friendname%', not 'friendname%'.

Moreover, the LIKE solution will get you into corners: suppose you were looking for a friend called Ann. The LIKE expression will also match Marianne, Danny etc. There's no "complete word" notion in a LIKE expression.

A real solution is to use a text index. A FULLTEXT index is only available on MyISAM, and MySQL 5.6 (not GA at this time) will introduce FULLTEXT on InnoDB.

Otherwise you can indeed use Sphinx to search the text.

With just hundreds or thousands, you will probably not see a big difference, unless you're really going to do many searches per second. With larger numbers, you will eventually realize that a full table scan is inferior to Sphinx search.

I'm using Sphinx a lot, on dozens and sometimes hundreds of millions large texts, and can testify it works like a charm.

The problem with Sphinx is, of course, that it's an external tool. With Sphinx you have to tell it to read data from your database. You can do so (using crontab for example) every 5 minutes, every hour, etc. So if rows are DELETEd, they will only be removed from sphinx the next time it reads the data from table. If you can live with that - that's the simplest solution.

If you can't, there are real time indexes in sphinx, so you may directly instruct it to remove certain rows. I am unable to explain everything in this port, so here are a couple links for you:

Index updates

Real time indexes

As final conclusion, you have three options:

Risk it and use a full table scan, assuming you won't have high load.
Wait for MySQL 5.6 and use FULLTEXT with InnoDB.
Use sphinx

At this point in time, I would certainly use option #3: use sphinx.

score 1 · Answer 4 · edited May 23 '17 at 12:02

Take a look at the solution I propose here: https://stackoverflow.com/a/22531268/543814

Your friend names are probably short, and your query looks simple enough. You can probably afford to store all suffixes, perhaps in a separate table, pointing back to the original table to get the full name.

This would give you fast infix search at the cost of a little bit more storage space.

Furthermore, to avoid finding 'Marianne' when searching for 'Ann', consider:

Using case-sensitive search. (Fragile; may break if your users enter their names or their search queries with incorrect capitalization.)
After the query, filtering your search results further, requiring word boundaries around the search term (e.g. regex \bAnn\b).

Sphinx vs. MySql - Search through list of friends (efficiency/speed)

4 Answers4

Edit

Linked