1

After my previous question (http://stackoverflow.com/questions/8217522/best-way-to-search-for-partial-words-in-large-mysql-dataset), I've chosen Sphinx as the search engine above my MySQL database.

I've done some small tests with it, and it looks great. However, i'm at a point right now, where I need some help / opinions.

I have a table articles (structure isn't important), a table properties (structure isn't important either), and a table with values of each property per article (this is what it's all about). The table where these values are stored, has the following structure:

articleID   UNSIGNED INT
propertyID  UNSIGNED INT
value       VARCHAR(255)

The primary key is a compound key of articleID and propertyID.

I want Sphinx to search through the value column. However, to create an index in Sphinx, I need a unique id. I don't have right here. Also when searching, I want to be able to filter on the propertyID column (only search values for propertyID 2 for example, which I can do by defining it as attribute).

On the Sphinx forum, I found I could create a multi-value attribute, and set this as query for my Sphinx index:

SELECT articleID, value, GROUP_CONCAT(propertyID) FROM t1 GROUP BY articleID

articleID will be unique now, however, now I'm missing values. So I'm pretty sure this isn't the solution, right?

There are a few other options, like:

  • Add an extra column to the table, which is unique
  • Create a calculated unique value in the query (like articleID*100000+propertyID)

Are there any other options I could use, and what would you do?

Michiel van Vaardegem
  • 2,260
  • 20
  • 35

1 Answers1

1

In your suggestions

  • Add an extra column to the table, which is unique

This can not be done for an existing table with large number of records as adding a new field to a large table take some time and during that time the database will not be responsive.

  • Create a calculated unique value in the query (like articleID*100000+propertyID)

If you do this you have to find a way to get the articleID and propertyID from the calculated unique id.

Another alternative way is that you can create a new table having a key field for sphinx and another two fields to hold articleID and propertyID.

  • new_sphinx_table with following fields

    id - UNSIGNED INT/ BIGINT

    articleID - UNSIGNED INT

    propertyID - UNSIGNED INT

Then you can write an indexing query like below

SELECT id, t1.articleID, t1.propertyID, value FROM t1 INNER JOIN new_sphinx_table nt ON t1.articleID  = nt.articleID AND t1.propertyID = nt.propertyID;

This is a sample so you can modify it to fit to your requirements.

What sphinx return is matched new_sphinx_table.id values with other attributed columns. You can get result by using new_sphinx_table.id values and joining your t1 named table and new_sphinx_table

Shantha Kumara
  • 3,272
  • 4
  • 40
  • 52
  • Thans for you answer. If i create a calcuated unqiue value, it is possbile to get the both ID's back. But it's not a really nice way of course... – Michiel van Vaardegem Dec 02 '11 at 07:55
  • If you do that, you need a field 2*(INT) size field to store combined key value in database, or else whenever you re-index the database, you have to recalculate it. Think you planned to re-index your DB once a day and the DB table contains 100000 records so the computation cost will be higher. And future growth of the users would make it more difficult. – Shantha Kumara Dec 02 '11 at 09:41
  • Yes. It isn't a good option. I will wait a few days more to see if other people have other opinions about my question, before marking it as an answer. – Michiel van Vaardegem Dec 02 '11 at 09:57
  • You are correct. There will be a better solution as there is no best solution. Making a post as an answer in short time, it will discourage others posting different idea. – Shantha Kumara Dec 03 '11 at 18:42