0

I currently have an MySQL (Aurora MySQL based on MySQL 5.7.x) PROCEDURE that takes a stringified list of ids as TEXT and uses a CONCAT function to create a prepared statement for execution.

While the procedure works most of the time, I am getting truncation errors on a few calls to it (likely that have a lot of IDs). The error message is essentially:

com.mysql.jdbc.MysqlDataTruncation Data truncation: Data too long for column 'stuffIds' at row 1

My initial thought was that we had selected too small of a type for the PROCEDURE param stuffIds, but when I checked, I discovered it's type is TEXT. It seems REALLY unlikely that our service is maxing out that type, which to my knowledge is limited at 2^31 chars (or 2GB of data). We just don't have enough IDs in any list to hit that limit.

My next thought was that maybe it's related to the CONCAT function that we're using to create the prepared statement? But while I've seen lots of issues & answers around GROUP_CONCAT and its max length (and the related configs), I have not been able to locate any info on the max length of string supported by MySQL's CONCAT function.

The only thing left I could think of was that maybe it's related specifically to AWS Aurora's implementation of MySQL? I think Aurora is supposed to be pretty close to vanilla MySQL, but maybe this is an edge case?

The PROCEDURE itself (the names have been changed to protect the innocent) raising this error looks like:

DELIMITER //
CREATE PROCEDURE GetThingsByStuffIds(stuffIds TEXT)

BEGIN

SET @stmt = CONCAT('SELECT ThingId FROM ThingTable WHERE StuffId IN (', stuffIds, ')');

PREPARE insert_stm FROM @stmt;
EXECUTE insert_stm;
DEALLOCATE PREPARE insert_stm;


END //
DELIMITER ;

Any insights are appreciated. Thanks!


UPDATE/NOTE/WARNING!

As @Schwern pointed out, I had my MySQL TEXT type info wrong!

which to my knowledge is limited at 2^31 chars (or 2GB of data)

That part of my question was very incorrect! Unlike PostgreSQL, as @Schwern states below, MySQL has three sized variants of TEXT - TEXT (65,535 chars), MEDIUMTEXT (16,777,215 chars), LONGTEXT (4,294,967,295 chars). I misunderstood the length limits of TEXT, hence why my call was exceeding the limits. The full info on these types in MySQL is here. Adding this note so my misinformation in the question doesn't mislead anyone!

cjn
  • 1,331
  • 1
  • 16
  • 22
  • Is the column named `stuffId` or `stuffIds`? The error message says the column is `stuffIds`, but your query uses `stuffId`. – Barmar Sep 03 '21 at 22:03
  • I know you changed names, but you need to make it clear whether the error message is referring to the table column or the procedure parameter. – Barmar Sep 03 '21 at 22:06
  • BTW, instead of creating a prepared statement, you could just use the `FIND_IN_SET()` function. – Barmar Sep 03 '21 at 22:07
  • @Barmar `StuffID` is the column name, `stuffIds` is the param string list of IDs I'm using in the `IN` clause. `FIND_IN_SET()` is a good idea sometimes for simplicity, but unfortunately it doesn't work well for me because I've found that it disregards indexes and performance is essential in my case. `IN` is well optimized and will leverage indexes. Thanks for the suggestions though! – cjn Sep 04 '21 at 00:12

2 Answers2

1

text cannot be larger than 2^16 bytes or 65,535. Depending on the character set, characters can take up multiple bytes, but if it's simple UTF-8 it's probably one byte per character.

You could use mediumtext (2^24 bytes) or longtext (2^32 bytes), but really your table needs to be redesigned. Storing lists as comma separated strings is extremely inefficient, complex, and error prone. Instead, use a many-to-many join table.

create thing_stuff (
  thing_id int not null references thing(id),
  stuff_id int not null references stuff(id)
);

-- Thing 1 has Stuff 2 and 5.
insert into thing_stuff (thing_id, stuff_id) values
  (1, 2), (1, 5);

-- Get all the Stuff associated with Thing 1.
select stuff.*
from stuff s
join thing_stuff ts on ts.stuff_id = s.id
where thing_id = 1;

-- Thing 1 no longer has Stuff 5.
delete from thing_stuff
where thing_id = 1 and stuff_id = 5;
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • Thanks, @Schwern. You're absolutely right about `MEDIUMTEXT` & `LONGTEXT`. I somehow lived life never realizing those existed in MySQL - likely because I came from a PostgreSQL world prior (which lumps them as `TEXT`). In light of that, we were likely hitting the 65.5k limit (our id's are currently up to 10 digits length), so it all makes sense now. – cjn Sep 03 '21 at 23:55
  • And good tip, but no worries, we don't use comma separated strings in our tables. Those are only to build the queries in our stored procedures (because MySQL doesn't have a list-type, so we can't call a `PROCEDURE` with a List of `BIGINT`). `StuffId` on the `ThingTable` is a foreign key to `StuffTable` that we're querying via the list `IN` clause. – cjn Sep 03 '21 at 23:59
  • @cjn Consider putting the ids into a temp table and passing in the name of the table. Or if you can replace the need for a procedure with a view. – Schwern Sep 04 '21 at 00:03
1

You will encounter a Problem as the IN Clause has a limit to the number of values it can hold and it is directly linked the max_allowed_packet size

A better Solution is to split the id string and add it to a temporary table and join both, with big numbers of elements this is also faster

The split function i lend form https://stackoverflow.com/a/11835246/5193536

CREATE TABLE ThingTable(StuffId int,ThingId int)
INSERT INTO ThingTable VALUES (1,1),(2,2),(3,3),(5,5)
CREATE FUNCTION SPLIT_STR(
  x VARCHAR(255),
  delim VARCHAR(12),
  pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
       LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
       delim, '');
CREATE PROCEDURE GetThingsByStuffIds(stuffIds TEXT)

BEGIN
      DECLARE a INT Default 0 ;
      DECLARE str TEXT;
      DROP TEMPORARY TABLE IF EXISTS my_temp_table;
      CREATE TEMPORARY TABLE my_temp_table(aint int);
      simple_loop: LOOP
         SET a=a+1;
         SET str=SPLIT_STR(stuffIds,",",a);
         IF str='' THEN
            LEAVE simple_loop;
         END IF;
         #Do Inserts into temp table here with str going into the row
         insert into my_temp_table values (str);
   END LOOP simple_loop;

SELECT ThingId FROM ThingTable t1 INNER JOIN  my_temp_table t2 ON t1.StuffId = t2.aint;



END
CALL GetThingsByStuffIds('1,2,3,4,5,6')
| ThingId |
| ------: |
|       1 |
|       2 |
|       3 |
|       5 |

✓

db<>fiddle here

nbk
  • 45,398
  • 8
  • 30
  • 47
  • Good insight! I had some wrong thinking around type info in MySQL, so that turned out to be the issue this time. BUT, I was not aware of this limitation in the `IN` clause, so if that didn't bite us now, it likely could later. Thanks for the valuable insight, @nbk! – cjn Sep 04 '21 at 00:01