1

I am trying to achieve a filter in Sphinx so that I can filter out the results from two MVA attributes (what I need is a double condition, but since MVA is only 2 fields..) from the same joined table, but whatever I do, It seems that I keep getting the same results since there is always a match on the first field in the multi attribute?

Maybe there's another solutions but I can't seem to find out how to get my desired result, I got to simplify 4 tables: PRODUCT, CATEGORY, PRICE and POSITION.

Data set-up

Below a sample of the data structure:

PRODUCT
+------+-------------+-----------------+---------------+
|  ID  | CATEGORY_ID | MANUFACTURER_ID |     TITLE     |
+------+-------------+-----------------+---------------+
| 1000 |        1000 |            1000 | Apple iPhone  |
| 1001 |        1000 |            1000 | Apple iPad    |
| 1002 |        1000 |            1000 | Apple iPod    |
| 1003 |        1001 |            1001 | Do not show   |
+------+-------------+-----------------+---------------+

CATEGORY
+------+-------+
|  ID  | TITLE |
+------+-------+
| 1000 | Apple |
| 1001 | Other |
+------+-------+

PRICE
+------+--------+---------+
|  ID  | USERID |  PRICE  |
+------+--------+---------+
| 1000 |   1000 |  359.00 |
| 1001 |   1001 | 1058.30 |
| 1002 |   1002 | 1078.00 |
| 1003 |   1003 | 1160.45 |
| 1004 |   1004 | 1180.00 |
| 1005 |   1000 | 1190.00 |
| 1006 |   1000 |  228.76 |
+------+--------+---------+

POSITION
+------+------------+--------+------+
|  ID  | PRODUCT_ID | USERID | RANK |
+------+------------+--------+------+
| 1000 |       1000 |   1000 |    1 |
| 1001 |       1001 |   1001 |    1 |
| 1002 |       1001 |   1002 |    2 |
| 1003 |       1001 |   1003 |    3 |
| 1004 |       1001 |   1004 |    4 |
| 1005 |       1001 |   1000 |    5 |
| 1006 |       1002 |   1000 |    1 |
+------+------------+--------+------+

Sphinx Set-up:

source product
{
        type                            = mysql

        sql_host                        = localhost
        sql_user                        = ...
        sql_pass                        = ...
        sql_db                          = ...

        sql_query_pre                   = SET NAMES utf8
        sql_query                       = SELECT P.ID AS ID, C.ID AS SEARCH_CAT_ID, P.CATEGORY_ID, P.MANUFACTURER_ID, P.TITLE AS TITLE_SORT FROM CATEGORY C, PRODUCT P WHERE P.CATEGORY_ID=C.ID
        sql_attr_uint                   = CATEGORY_ID
        sql_attr_uint                   = MANUFACTURER_ID
        sql_attr_string                 = TITLE_SORT
        sql_attr_multi                  = uint POS_RANK from query; SELECT PRODUCT_ID, RANK FROM POSITION
        sql_attr_multi                  = uint POS_USERID from query; SELECT PRODUCT_ID, USERID FROM POSITION
}
index product
{
        source                   = product
        path                     = ...
        docinfo                  = extern
        min_word_len             = 1
}

MySQL Dump:

CREATE TABLE IF NOT EXISTS `CATEGORY` (
  `ID` int(11) NOT NULL,
  `TITLE` varchar(255) NOT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `CATEGORY` (`ID`, `TITLE`) VALUES
(1000, 'Apple'),
(1001, 'Other');

CREATE TABLE IF NOT EXISTS `POSITION` (
  `ID` int(11) NOT NULL,
  `PRODUCT_ID` int(11) NOT NULL,
  `USERID` int(11) NOT NULL,
  `RANK` int(11) NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `USERID` (`USERID`),
  KEY `PRODUCT_ID` (`PRODUCT_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `POSITION` (`ID`, `PRODUCT_ID`, `USERID`, `RANK`) VALUES
(1000, 1000, 1000, 1),
(1001, 1001, 1001, 1),
(1002, 1001, 1002, 2),
(1003, 1001, 1003, 3),
(1004, 1001, 1004, 4),
(1005, 1001, 1000, 5),
(1006, 1002, 1000, 1);

CREATE TABLE IF NOT EXISTS `PRICE` (
  `ID` int(11) NOT NULL,
  `USERID` int(11) NOT NULL,
  `PRICE` decimal(9,2) NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `USERID` (`USERID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `PRICE` (`ID`, `USERID`, `PRICE`) VALUES
(1000, 1000, '359.00'),
(1001, 1001, '1058.30'),
(1002, 1002, '1078.00'),
(1003, 1003, '1160.45'),
(1004, 1004, '1180.00'),
(1005, 1000, '1190.00'),
(1006, 1000, '228.76');

CREATE TABLE IF NOT EXISTS `PRODUCT` (
  `ID` int(11) NOT NULL,
  `CATEGORY_ID` int(11) NOT NULL,
  `MANUFACTURER_ID` int(11) NOT NULL,
  `TITLE` varchar(255) NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `CATEGORY_ID` (`CATEGORY_ID`),
  KEY `MANUFACTURER_ID` (`MANUFACTURER_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `PRODUCT` (`ID`, `CATEGORY_ID`, `MANUFACTURER_ID`, `TITLE`) VALUES
(1000, 1000, 1000, 'Apple iPhone'),
(1001, 1000, 1000, 'Apple iPad'),
(1002, 1000, 1000, 'Apple iPod'),
(1003, 1001, 1001, 'Do not show');

ALTER TABLE `POSITION`
  ADD CONSTRAINT `POSITION_ibfk_1` FOREIGN KEY (`PRODUCT_ID`) REFERENCES `POSITION` (`ID`) ON DELETE CASCADE;

ALTER TABLE `PRODUCT`
  ADD CONSTRAINT `PRODUCT_ibfk_1` FOREIGN KEY (`CATEGORY_ID`) REFERENCES `CATEGORY` (`ID`) ON DELETE CASCADE;

Test 1

SphinxQL Query for filter USERID:

SELECT ID FROM product WHERE MATCH('1000') AND POS_USERID IN (1000) ORDER BY WEIGHT() DESC LIMIT 0,20 FACET POS_RANK LIMIT 5;

Result for USERID (first set seems OK this time, second returns everything):

+------+
|  id  |
+------+
| 1000 |
| 1001 |
| 1002 |
+------+

+----------+----------+
| pos_rank | count(*) |
+----------+----------+
|        5 |        1 |
|        4 |        1 |
|        3 |        1 |
|        2 |        1 |
|        1 |        3 |
+----------+----------+

What I expected:

  • Product ID (1000,1001,1002)
  • Position Rank (1 with count 3, 5 with count 1)

Test 2

SphinxQL Query for filter USERID 1000 and RANK 5:

SELECT ID FROM product WHERE MATCH('1000') AND POS_USERID IN (1000) AND POS_RANK IN (5) ORDER BY WEIGHT() DESC LIMIT 0,20 FACET POS_RANK LIMIT 5;

Result for USERID and RANK (first set seems OK this time, second returns everything):

+------+
|  id  |
+------+
| 1001 |
+------+

+----------+----------+
| pos_rank | count(*) |
+----------+----------+
|        5 |        1 |
|        4 |        1 |
|        3 |        1 |
|        2 |        1 |
|        1 |        1 |
+----------+----------+

What I expected:

  • Product ID (1001)
  • Position Rank (5 with count 1)

Test 3

SphinxQL Query for filter USERID 1000 and RANK 4:

SELECT ID FROM product WHERE MATCH('1000') AND POS_USERID IN (1000) AND POS_RANK IN (4) ORDER BY WEIGHT() DESC LIMIT 0,20 FACET POS_RANK LIMIT 5;

Result for USERID and RANK (first stil returns product, second returns everything):

+------+
|  id  |
+------+
| 1001 |
+------+

+----------+----------+
| pos_rank | count(*) |
+----------+----------+
|        5 |        1 |
|        4 |        1 |
|        3 |        1 |
|        2 |        1 |
|        1 |        1 |
+----------+----------+

What I expected:

  • Product ID (empty set)
  • Position Rank (empty set)

I hope you guys understand what I am trying to achieve and can help me out?

Jordy
  • 85
  • 13
  • Hi. What's your field in the source? You do "WHERE MATCH('1000')", but there's no field, only attributes. Sphinx wouldn't allow you to build the index w/o a field. – Manticore Search Jul 31 '18 at 10:59
  • Interesting.. I did not configurated any field indeed but it is matching on C.ID AS SEARCH_CAT_ID since this is the only not configured field in this case, I am using SphinxQL directly in PHP and with SSH I am testing with mysql --port=9306 and I do get the results back in exact amounts what I expected and indexed, checked by SHOW META in the end and the counts are correct. Do you understand my intention with the filter? Can you help me further? – Jordy Jul 31 '18 at 12:16
  • I don't know if that is something version related (I am running 2.9.9) becouse you say sphinx wouldn't allow to build index w/o field but I tried with a field, still the same result.. I added the field: sql_field_string = SEARCH_CAT_ID and then match "WHERE MATCH('@SEARCH_CAT_ID=1216')". any ideas to get my disired result? – Jordy Jul 31 '18 at 13:24
  • "I am running 2.9.9" - don't you mean 2.1.9? In test 1 you expected "Position Rank (1 with count 3, 5 with count 1)". It works this way in Manticore v2.7.1. I'm not sure why it works differently in your version. Test 2: you expected "Position Rank (5 with count 1)", why 5, not 4 or 3 or 2 or 1? All of them are in the MVA pos_rank of the document with id 1001 according to your dump. Test 3: you expected empty result sets, but try "SELECT * FROM product WHERE MATCH('1000') AND POS_USERID IN (1000) AND POS_RANK IN (5)" and you'll see that there's a reason why the results sets are not empty – Manticore Search Jul 31 '18 at 16:40
  • Perhaps you want to use ALL() in Test 3 like this "SELECT * FROM product WHERE MATCH('1000') AND ALL(POS_USERID) IN (1000) AND ALL(POS_RANK) IN (5);". This will give you empty result sets. Otherwise ANY() is supposed. The both were introduced in 2.2.1, so if you use 2.1.9 you may want to upgrade to use ALL(). – Manticore Search Jul 31 '18 at 16:42
  • The version was a typo indeed, I am running 2.2.9. Test 1: I am expecting to only get RANK 1 and 5 since my intention was to first filter out by USERID with the "AND POS_USERID IN (1000)" so the only ranks associated with USERID 1000 are 1 and 5 for product 1000,1001,1002. Test 2: I am trying to filter for both USERID 1000 and RANK 5, so I only need RANK 5 back with the associated count for product 1001. Test 3: I am expecting an empty set since there is no RANK 4 on any product for USERID 1000 but it is matching on the POS_RANK for product 1001 so it returns all counts and product 1001. – Jordy Aug 01 '18 at 06:57
  • Test 1: the fact that you have USER_ID in POSITION doesn't change anything since your pos_rank in the Sphinx config is "sql_attr_multi = uint POS_RANK from query; SELECT PRODUCT_ID, RANK FROM POSITION", it doesn't know anything about POSITION.USER_ID, so "WHERE MATCH('1000') AND POS_USERID IN (1000)" finds you 3 documents and doc 1001 has pos_rank "1,2,3,4,5", the other 2 have pos_rank "1", so 1-5 are shown in the facet and for pos_rank=1 the count is 3. – Manticore Search Aug 01 '18 at 10:55
  • Test 2: you find the doc 1001 by your query and then break down by its pos_ranks. There's no relationship between it's pos_rank and pos_userids, you just find the doc and then the FACET works with the returned set/sets – Manticore Search Aug 01 '18 at 10:59
  • Test 3: "there is no RANK 4 on any product for USERID 1000". Document 1001 has pos_rank 4 and at the same time pos_userid 1000. That's why it's returned. Try select * instead of select ID to see clearer why this or that result is returned. – Manticore Search Aug 01 '18 at 11:03
  • I see why I get those results back but is there another way to achieve what I want then? Becouse the data in pre-indexed offcurse I need to filter trough it and can not set a WHERE condition after "sql_attr_multi = uint POS_RANK from query; SELECT PRODUCT_ID, RANK FROM POSITION" for a specific USERID to get the associated RANKS back? – Jordy Aug 01 '18 at 12:52
  • I'm afraid if you want to control the data fully the only thing you can do is to denormalize th data you store in Sphinx so you can sort and group by whatever attributes you want – Manticore Search Aug 08 '18 at 08:46

0 Answers0