2

I have a field (link) that is varchar (1500) and that I want to make unique. I applied changes to mysql configuaration and increased length to 3072 bytes

ROW_FORMAT=DYNAMIC, innodb_file_format = Barracuda, innodb_large_prefix = true

But when I apply unique to my field, I got next error:

"#1071 - Specified key was too long; max key length is 3072 bytes"

My field is varchar(1500) that is 3000 bytes.

What's wrong?

Update (1) Table data:

CREATE TABLE IF NOT EXISTS `pages` (
  `link` varchar(1500) NOT NULL,
  `domain` varchar(255) NOT NULL,
  `lastvisited` datetime DEFAULT NULL,
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`),
  KEY `link` (`link`(255))
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ROW_FORMAT=DYNAMIC;

Update (2) Alter command (done via PHPMYADMIN)

ALTER TABLE  `pages` ADD UNIQUE (
`link`
)
Tigran
  • 1,049
  • 3
  • 15
  • 31
  • What is the `character set of table and column` ? – Abdul Manaf Feb 25 '14 at 09:20
  • utf8. I updated the question. – Tigran Feb 25 '14 at 09:24
  • 3
    If you're using UTF-8 each character can be 3 bytes - that's 4500 bytes and exceeds the maximum key length. I'd suggest that in any case a key that long is going to be unwieldy at best. –  Feb 25 '14 at 09:24
  • you can make your column `link` as `TEXT` type if you want to insert a long string in it. – Satish Sharma Feb 25 '14 at 09:25
  • Have a look at http://stackoverflow.com/questions/16568128/max-size-of-unique-index-in-mysql – Abdul Manaf Feb 25 '14 at 09:27
  • What is the purpose of having such a long prefix? – zerkms Feb 25 '14 at 09:29
  • 1
    @Tigran: Your `CREATE` statement doesn't show the `UNIQUE` constraint that you talk about in your question, just a non-unique `KEY` – lanzz Feb 25 '14 at 09:29
  • TEXT and BLOB is stored off the table with the table just having a pointer to the location of the actual storage. So TEXT should be slower. – Tigran Feb 25 '14 at 09:29
  • @lanzz, It does not. I create table and then try adding unique. – Tigran Feb 25 '14 at 09:30
  • @Tigran: Would be useful to see how you're doing that – lanzz Feb 25 '14 at 09:30
  • @Tigran: so what is the actual `ALTER` you're using? – zerkms Feb 25 '14 at 09:31
  • What sould I use, If I want to set link being long (ideally, 2048 symbols) and make insert and select as fast as possible. – Tigran Feb 25 '14 at 09:31
  • @Tigran: Are you sure you're going to select rows _based on the `link` column_? I.e., `WHERE link = `? – lanzz Feb 25 '14 at 09:32
  • @Tigran: just imagine how many first characters will be different. I hardly doubt it makes sense to have more than 30-40 characters long prefix for the url – zerkms Feb 25 '14 at 09:32
  • @zerkms the table definition was edited into the question after I posted my comment –  Feb 25 '14 at 09:32
  • Yes. Link is url and I add here and select url's for inspection. Usually first part like (http), domain is commom. Next like part of query can be also common. – Tigran Feb 25 '14 at 09:33
  • My question is, are you going to select _only rows where `link` has a specific value_, not if you're going to select the `link` column itself. – lanzz Feb 25 '14 at 09:35
  • @Tigran: do have some typical set of urls for your application? If so - `GROUP BY` substring of first N characters and count entries. – zerkms Feb 25 '14 at 09:36
  • @lanzz, my common operation are: insert if non exist - adding url for inspection; update lastvisited set NOW() where ; select * where lastvisited IS NULL and domain = 'some.com'; As most heavy operation is insert non unique and that is done via 20-40 Java threads simultaneuos running, I decided to add UNIQUE. – Tigran Feb 25 '14 at 09:38
  • @zerkms, No typical set. That's a project for my university and similar to search engine bot. Urls can be any (exploration of web). – Tigran Feb 25 '14 at 09:40
  • @Tigran: so grab any random several hundreds/thousands urls from the documents you have and try. Just picked some and realized that 40 would be a good start which may be changed as soon as you get real stats for your app. – zerkms Feb 25 '14 at 09:45
  • What about making my url unique. I think that combination of insert () if non exists (select where url = 'some.com') will be extremely slow on 10 millions of records? – Tigran Feb 25 '14 at 09:48

1 Answers1

3

Since you will be storing URLs in the link column, you don't actually need to use UTF8 for it, because URLs can contain only ASCII characters. Specifying a plain ASCII character encoding for your link column will even allow you to raise its max length to 3072 characters.

CREATE TABLE IF NOT EXISTS `pages` (
  `link` varchar(1500) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  `domain` varchar(255) NOT NULL,
  `lastvisited` datetime DEFAULT NULL,
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`),
  UNIQUE KEY `link` (`link`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ROW_FORMAT=DYNAMIC;

(Updated as per @eggyal's suggestion for the ascii_bin collation)

lanzz
  • 42,060
  • 10
  • 89
  • 98
  • 2
    +1 I was just about hit "post" with exactly the same answer, albeit recommending a limit of 2000 characters per [What is the maximum length of a URL in different browsers?](http://stackoverflow.com/a/417184) Also, I would recommend specifying the `ascii_bin` collation for the column, since URL paths can be case-sensitive. – eggyal Feb 25 '14 at 09:52
  • Are you sure with 'URLs can contain only ASCII characters'.What about Russia and China zones? As far as I know, they are using non ASCII symbols in domain names. – Tigran Feb 25 '14 at 09:55
  • http://stackoverflow.com/questions/4683081/is-it-advisable-to-have-non-ascii-characters-in-the-url – Tigran Feb 25 '14 at 09:57
  • But anyway, I will try with ascii_bin and if need some extending will create additional table. – Tigran Feb 25 '14 at 09:58
  • 2
    @Tigran: Non-ASCII characters in URLs need to be percent-encoded. Reference: [RFC 3986](http://tools.ietf.org/html/rfc3986#section-2) – lanzz Feb 25 '14 at 09:58
  • 1
    @lanzz: More explicitly, they must be percent-encoded in the path, or Puncycode-encoded in the hostname: but wherever they are placed, most browsers will perform the encoding "behind the scenes" whilst displaying non-ASCII characters to the user. – eggyal Feb 25 '14 at 10:00
  • 1
    I have a problem with your solution, when I create table, I get error that #1709 - Index column size too large. The maximum column size is 767 bytes. However, if I create initial table error is "#1071 - Specified key was too long; max key length is 3072 bytes". – Tigran Feb 25 '14 at 10:45
  • I think that it's linked with CHARACTER SET ascii COLLATE ascii_bin – Tigran Feb 25 '14 at 10:45
  • @Tigran: Couldn't reproduce, the posted `CREATE TABLE` statement successfully creates the table in my tests. Do you also have a `innodb_file_per_table=true` setting in your `my.cnf`? [Seems to be a requirement for `innodb_large_prefix`](http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_large_prefix). – lanzz Feb 25 '14 at 13:24
  • I put into my.ini, but this does not work. I got a better solution of using hash - sha1 for a unique to speed up. – Tigran Feb 25 '14 at 15:51