What I'm doing
I'm working on a chat application (written in PHP), which allows users to create their own chat rooms. A user may name a chat room anything they like and this name is passed on to the MySQL database in a prepared statement as the table name for that respective chat room.
It is understood that there is no log in / security measure for this application and the table holding the chat log is composed of records with simply the user submitted text and timestamp (2 columns, without counting an AUTO_INCREMENT
primary key).
What I'm facing
Given the simple nature of this application, I don't have the intention of changing the structure of the database, but I'm now running into the issue when a user enters emoji (or other supplementary characters) as the name for their own chat room. Passing such information on to the database as is will convert the characters into question marks, due to the way MySQL works internally (https://dev.mysql.com/doc/refman/5.7/en/identifiers.html):
Identifiers are converted to Unicode internally. [..] ASCII NUL (U+0000) and supplementary characters (U+10000 and higher) are not permitted in quoted or unquoted identifiers.
What should / can I do to avoid this problem? Is there a best practice for "escaping" / "sanitizing" user input in a situation like this? I put the respective words in quotation marks because I know it is not the proper / typical way of handling user input in a database.
What I'm trying
An idea I had was using rawurlencode()
to literally break down the supplementary characters into unique sequences that I can pass on to the database and still be sure that a chat room with the name is not confused with
. However, I have the impression based on this answer that this is not good practice: https://stackoverflow.com/a/8700296/1564356.
Tackling this issue another way, I thought of base64_encode()
, but again based on this answer it is not an ideal approach: https://stackoverflow.com/a/24175941/1564356. I'm wondering however, if in this case it would still be an acceptable one.
A third option would be to construct the database in a different way by issuing unique identifiers as the table names for each respective chat room and storing the utf8mb4
compatible string in a column. Then a second table with the actual chat log can be linked with a foreign key. This however complicates the structure of the database and doubles the amount of tables required. I'm not a fan of this approach.
Any ideas? Thanks!