1

I'm trying to create a table which one of their columns will hold characters more than 5000 characters and I don't want any row for this column to be repeated so I used a primary key to make every row in this column not save again when it's already exist

But the problem is when I'm trying to create this column with column_name VARCHAR(5500) Primary key it's giving me this error Specified key was too long; max key length is 767 bytes

I searched a lot and found that InnoDB engine accepts only 767 characters as max length and in MYISAM engine accepts 1000 character but this doesn't help me because this column maybe hold more than 5000 character

What I'm looking for is a way to create a column which no one of its rows can be repeated and accepts many characters

CREATE TABLE data_table (
    date_time VARCHAR(100),
    message VARCHAR(5500) PRIMARY KEY
) ENGINE = MYISAM CHARACTER SET latin1
O. Jones
  • 103,626
  • 17
  • 118
  • 172
Root Zxcvpn
  • 35
  • 1
  • 6

3 Answers3

1

You have hit a fundamental limitation. Sadly, no amount of negotiation or hacking will find you a way to make an index as long as you need. Therefore, a unique index is not a solution to your problem of preventing duplicate text strings.

Many people store a hash of long text fields along with the text.

SHA-256 is a decent choice for a hash. The issue with hashes is the chance of a hash collision. That is, it is possible that two different text strings will generate the exact same hash. With SHA-256 or larger hashes, that chance is very low indeed.

If you work with SHA-256, you need a column defined like this. (32 bytes is the same as 256 bits, of course.)

text_hash BINARY(32)

Then when you go to insert text you can do this.

INSERT INTO tbl (text, text_hash) VALUES(?, UNHEX(SHA2(?, 256));

If you make your text_hash into a unique index you'll have a way of preventing duplicates by throwing an error when trying. Something like this.

 CREATE UNIQUE INDEX no_text_dups_please ON tbl(text_hash);
O. Jones
  • 103,626
  • 17
  • 118
  • 172
0

Needs : "[one] column will hold characters (more than 5000 characters) and I don't want any row for this column to be repeated"

PRIMARY KEY add a UNIQUE CONSTRAINT on the field(s) specified, but if you don't need to use it as PRIMARY KEY use only UNIQUE. In addition, I would not recommend UNIQUE CONSTRAINT on large text column.

I would recommend you to check the unicity of your data by making and storing hashs of your texts.

SeeoX
  • 565
  • 3
  • 18
0

Sure, the Hash is one way. (I think the latest MariaDB has a technique for doing that by magic!) Here's another approach:

For many reasons, you should switch from MyISAM to InnoDB, but I will ignore that for this Q&A.

CREATE TABLE data_table (
    date_time VARCHAR(100),
    message VARCHAR(5500) PRIMARY KEY
    INDEX(message(100))
) CHARACTER SET utf8mb4   -- since you might get non-English test, including Emoji.

(The "100" is a tradeoff between speed and space.)

But you will have to do an extra test:

SELECT 1 FROM data_table WHERE message = ?

If you get something back, you have a dup -- take action. Else do an INSERT.

Oops, I do need to insist on InnoDB -- at least if you could have conflicting connections inserting the same message:

BEGIN;
SELECT 1 FROM data_table WHERE message = ? FOR UPDATE;
if ... then handle dup and don't COMMIT
INSERT INT data_table (date_time, message) VALUES (?, ?);
COMMIT;

You might want to hide all that inside a Stored Procedure.

Rick James
  • 135,179
  • 13
  • 127
  • 222