7

I'm using a django application which does some 'startswith' ORM operations comparing longtext columns with a unicode string. This results in a LIKE BINARY comparison operation with a u'mystring' unicode string. Is a LIKE BINARY likely to be any slower than a plain LIKE?

I know the general answer is benchmarking, but I would like to get a general idea for databases in general rather than just my application as I'd never seen a LIKE BINARY query before.

I happen to be using MySQL but I'm interested in the answer for SQL databases in general.

kdt
  • 27,905
  • 33
  • 92
  • 139

3 Answers3

7

If performance seems to become a problem, it might be a good idea to create a copy of the first eg. 255 characters of the longtext, add an index on that and use the startswith with that.

BTW, this page says: "if you need to do case-sensitive matching, declare your column as BINARY; don't use LIKE BINARY in your queries to cast a non-binary column. If you do, MySQL won't use any indexes on that column." It's an old tip but I think this is still valid.

tuomassalo
  • 8,717
  • 6
  • 48
  • 50
  • 1
    confirmed this behavior in mysql 5.5.31. For django, this means it's important to use __istartswith instead of __startswith for good performance. – Julian Jul 15 '13 at 23:13
2

For the next person who runs across this - in our relatively small database the query:

SELECT * FROM table_name WHERE field LIKE 'some-field-search-value';

... Result row

Returns 1 row in set (0.00 sec)

Compared to:

SELECT * FROM table_name WHERE field LIKE BINARY 'some-field-search-value';

... Result row

Returns 1 row in set (0.32 sec)

Long story short, at least for our database (MySQL 5.5 / InnoDB) there is a very significant difference in performance between the two lookups.

Apparently though this is a bug in MySQL 5.5: http://bugs.mysql.com/bug.php?id=63563 and in my testing against the same database in MySQL 5.1 the LIKE BINARY query still uses the index (while in 5.5 it does a full table scan.)

philipk
  • 1,473
  • 16
  • 22
2

A trick: If you don't want to change the type of your column to binary, try to write your ‍WHERE statement like this:

WHERE field = 'yourstring' AND field LIKE BINARY 'yourstring'

instead of:

WHERE field LIKE BINARY 'yourstring'

Indeed, it will check the first condition very quickly, and try the second one only if the first one is true.

It worked well on my project for this test of equality, and I think you can adapt this to the "starts with" test.

Amir Shabani
  • 3,857
  • 6
  • 30
  • 67
jmath
  • 21
  • 1
  • But one may do case folding and the other not?? – Rick James Apr 23 '19 at 23:33
  • 1
    Well, it depends whether your collation is case sensitive or not. But if your collation is case insensitive and you don't want to change it, you may have very bad performance using directly "LIKE BINARY" because your index wouldn't be used. Indeed, the index depends from the collation. With "=", the engine use the index, whereas with "LIKE BINARY", it needs to scan all the table and convert every line to binary so as to compare it to your string. – jmath Apr 26 '19 at 16:06