8

I have two tables:

CREATE TABLE IF NOT EXISTS `test1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `bucket_id` int(10) unsigned NOT NULL COMMENT 'folder this component belongs to',
  `test1_name` varchar(81) NOT NULL COMMENT 'Name of this component',
  `test1_desc` varchar(1024) NOT NULL COMMENT 'Component Description',
  PRIMARY KEY (`id`),
  FULLTEXT KEY `test1_search` (`test1_name`,`test1_desc`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=3 ;

CREATE TABLE IF NOT EXISTS `bucket` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `bkt_name` varchar(81) NOT NULL COMMENT 'The name of this bucket',
  `bkt_desc` varchar(1024) NOT NULL COMMENT 'A description of this bucket',
  `bkt_keywords` varchar(512) DEFAULT NULL COMMENT 'keywords for searches',
  PRIMARY KEY (`id`),
  FULLTEXT KEY `fldr_search` (`bkt_desc`,`bkt_keywords`,`bkt_name`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=8 ;

Bucket is just a holder while test1 contains all the things that would go into a bucket. For example:

INSERT INTO `bucket` (`id`, `bkt_name`, `bkt_desc`, `bkt_keywords`) VALUES
(1, 'Simpsons', 'The Simpsons Cartoon Family was first successful adult cartoon series', 'Homer, Marge, Lisa and Bart'),
(2, 'Griffins', 'The family from the popular family guy series', 'Peter, Lois, Meg, Chris, Stewie, Brian');

INSERT INTO `test1` (`id`, `bucket_id`, `bkt_name`, `bkt_desc`) VALUES
(1, 1, 'Homer Simpson', 'Homer the figurative head of the Simpsons Family and is the husband of Marge'),
(2, 2, 'Peter Griffin', 'Peter the figurative head of the Griffin family on the hit TV seriers The family Guy');

Now, using the following query I want to look for all buckets whose name, description or keywords contain the search term "family" or whose components contain the words "family")

So far, what I have is this query and it's not returning mixed case results as in "Family" is not found while "family" is.

SELECT *
FROM bucket
RIGHT JOIN test1 ON test1.bucket_id = bucket.id
WHERE
  bucket.isvisible > 0 AND
  MATCH(bucket.bkt_keywords, bucket.bkt_desc, bucket.bkt_name)
    AGAINST('family' IN BOOLEAN MODE) OR
  MATCH(test1.test1_name, test1.test1_desc) 
    AGAINST('family' IN BOOLEAN MODE)

I should also add that all text fields have the collation of utf8_general_ci as does the entire table which is MyISAM.

ppetree
  • 826
  • 3
  • 15
  • 31
  • Your example provided has "family" in all lower case in both the bucket description and the test description. Can you make a better example to demonstrate your problem? – scwagner Feb 04 '13 at 20:25
  • 1
    FYI, the field names in your insert statement for test1 are incorrect (should be `test1_name`, `test1_desc`). Also your SELECT has bucket.isvisible > 0, which isn't included in the example table. Once I fixed those two things though, the query returned both rows, so unfortunately I don't know the cause of your actual problem. (Unless of course the whole capital thing is a red herring, and the real problem is that bucket.isvisible parameter on the simpsons row!) – Nathan Stretch Feb 19 '13 at 07:03
  • Also you could simplify the query like so: SELECT * FROM bucket RIGHT JOIN test1 ON test1.bucket_id = bucket.id WHERE MATCH(bucket.bkt_keywords, bucket.bkt_desc, bucket.bkt_name, test1.test1_desc, test1.test1_name) AGAINST('family' IN BOOLEAN MODE) – Nathan Stretch Feb 19 '13 at 07:06
  • Maybe your column has a binary collation, even though your table does not? Double-check with SHOW FULL COLUMNS FROM tablename. (from user fenway) – Lighthart Feb 21 '13 at 19:58

2 Answers2

6

I think your tables do not use utf8_general_ci as collation, but utf8_bin. I was able to reproduce the behaviour you describe after modifying the tables as follows:

ALTER TABLE test1 CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
ALTER TABLE bucket CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;

You should perhaps set your tables' collation explicitely to:

ALTER TABLE test1 CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE bucket CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

If the above changes anything, I would guess your server or session is actually set to use another collation by default (since the collation is not specified in your tables definition). This could be checked with:

SHOW GLOBAL VARIABLES LIKE 'collation_server';
SHOW SESSION VARIABLES LIKE 'collation_server';
RandomSeed
  • 29,301
  • 6
  • 52
  • 87
  • Using phpMyAdmin all the fields AND the tables show the collation as utf8_general_ci. If I export the tables the COLLATE is set to utf8_general_ci. – ppetree Feb 15 '13 at 12:12
2

The answer is apparently adding some parens around the two match against clauses.

SELECT *
FROM bucket
RIGHT JOIN test1 ON test1.bucket_id = bucket.id
WHERE bucket.isvisible > 0 AND
 ( MATCH(bucket.bkt_keywords, bucket.bkt_desc, bucket.bkt_name)
  AGAINST('family' IN BOOLEAN MODE) OR
  MATCH(test1.test1_name, test1.test1_desc) 
  AGAINST('family' IN BOOLEAN MODE) )
ppetree
  • 826
  • 3
  • 15
  • 31
  • OMG! Yes, this is because of operator precedence (http://dev.mysql.com/doc/refman/5.5/en/logical-operators.html)... Without parens, your condition was equivalent to `WHERE ( isvisible >0 AND MATCH(...) ) OR ( MATCH(...) )` – RandomSeed Feb 23 '13 at 11:07