1

I have a database of articles that I want to search through. I had been using normal Django ORM to search, which was getting way to slow and then I got to know a little about Indexes in Django. I'm using MySQL and I now know that with MYSQL I cannot put an index field into a TextField as described here in this stack question which I was facing. However in my case I can't change this to CharField.

I was reading through the MyQSL Docs which stated

MySQL cannot index LONGTEXT columns specified without a prefix length on the key part, and prefix lengths are not permitted in functional key parts.

Hence I was of the understanding that since TextField in Django is LONGTEXT for MYSQL, I came across this Django-MySQL package here and thought that using this if I could change the LONGTEXT to a MEDIUMTEXT using this package, this might get resolved. So my updated model I did this

class MyModel(Model):
    ........
    document = SizedTextField(size_class=3)

However, I still see the same error while applying python manage.py makemigrations

django.db.utils.OperationalError: (1170, "BLOB/TEXT column 'document' used in key specification without a key length")

How can I go about resolving this?

Dipanshu Juneja
  • 1,204
  • 14
  • 29
  • 1
    Please show us the `SELECT` that you hope will speed up via the index. This will help us discuss `FULLTEXT` versus "prefix" versus some other solution. – Rick James Feb 20 '22 at 16:40
  • @RickJames I'm simply returning all the articles that contain a given word passed by the client. So would be something `SELECT * from articles WHERE text CONTAINS searchword` – Dipanshu Juneja Feb 21 '22 at 16:49

2 Answers2

1

All of these related types, TEXT, MEDIUMTEXT, and LONGTEXT, are too large to be indexed without specifying a prefix. An index prefix means that only the first N characters of the string are included in the index. Like this:

create table mytable (
  t text, 
  index myidx (t(200))
);

The prefix length in this example is 200 characters. So only the first 200 characters are included in the index. Usually this is enough to help performance, unless you had a large number of strings that are identical in their first 200 characters.

The longest prefix that MySQL supports depends on the storage engine and the row format. Old versions of MySQL support index prefix up to 768 bytes, which means a lesser number of characters depending on if you use multi-byte character sets like utf8 or utf8mb4. The recent versions of MySQL default to a more modern row format, which supports up to 3072 bytes for an index, again reduced by 3 or 4 bytes per character.

I'm not a regular Django user, so I tried to skim the documentation about defining indexes on model classes. But given a few seconds of reading, I don't see an option to declare a prefix for an index on a long string column.

I think your options are one of the following:

  • Change the column to a shorter string column that can be indexed
  • Create the index using the MySQL client, not using Django migrations
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • This is helpful, thanks Bill. Yes, for my case I do have a large number of similar articles so setting a prefix might not be an option. Further, changing to a shorter string column (CharField for Django) may also be difficult since I do not have the limit at the client end however articles are around 10-20 KB in average so if CharField supports this size for MySQL this may work. Lastly, I will have to explore the last option you recommended for creating an index using MySQL client directly, since I had only been using Django for db operations. – Dipanshu Juneja Feb 20 '22 at 05:39
  • 2
    Do you expect to search your articles for keywords, not for the entire exact text? If so, then creating an index wouldn't help anyway. You should use a [fulltext index](https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html). I don't know if Django supports creating those kinds of indexes through the model classes, or if you'd have to create it manually. – Bill Karwin Feb 20 '22 at 07:11
  • I'm simply returning all the articles that contain a given keyword by the client application. Fulltext index looks relevant, I found this [answer](https://stackoverflow.com/questions/2248743/django-mysql-full-text-search) on how to do this with Django by adding a lookup which seems interesting. – Dipanshu Juneja Feb 21 '22 at 16:44
1

returning all the articles that contain a given word passed by the client. So would be something SELECT * from articles WHERE text CONTAINS searchword

Add

FULLTEXT(text)

and use

WHERE MATCH(text) AGAINST("searchword")

or perhaps

WHERE MATCH(text) AGAINST("+searchword" IN BOOLEAN MODE)

It will run very fast. There are caveats -- short words and "stop" words (like "the") are ignored.

(If DJango cannot facilitate that, then you have to do it with "raw SQL".)

Rick James
  • 135,179
  • 13
  • 127
  • 222