0

While debugging my Rails App I found the following messages in the log file:

(0.1ms)  ROLLBACK
Completed 500 Internal Server Error in 25ms (ActiveRecord: 4.2ms)
ActiveRecord::StatementInvalid (Mysql2::Error: Incorrect string value: '\xF0\x9F\x98\x89 u...' for column 'description' at row 1: INSERT INTO `course` (`title`, `description`) VALUES ('sometitle', '<p>Description containing  and stuff</p>')

This seems to stem from my database being MySQL with not-quite-utf-8:

CREATE TABLE `course` (
  `id` int NOT NULL AUTO_INCREMENT,
  `title` varchar(250) DEFAULT NULL,
  `description` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2080 DEFAULT CHARSET=utf8;

According to the answers to this question CHARSET=utf8 is only capable of handling 3 byte characters, not 4 byte characters.

The Emoticon needs four bytes - see \xF0\x9F\x98\x89 in the log file.

I am wary of converting the whole database. I would rather forbid the use of emoticons and other 4 byte characters - they are really not necessary on my site.

What is the best way to do this in Rails?

bjelli
  • 9,752
  • 4
  • 35
  • 50

2 Answers2

0

Building on the regular expressions from these answers I implemented a validator:

# file /lib/three_byte_validator.rb
# forbid characters that use more than three byte
class ThreeByteValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    if value =~ /[\u{10000}-\u{10FFFF}]/
      record.errors.add attribute, (options[:message] || 'Keine Emoticons, keine UTF-8 Zeichen mit 4 Byte')
    end
  end
end

Now I can use this validator on the model that first had the problem:

class Course < ApplicationRecord
  validates :title, length: { in: 3..100 }, three_byte: true
  validates :description, length: { minimum: 50 }, three_byte: true

and also on other models:

class Person < ApplicationRecord
  validates :firstname, :lastname, :country, :city, :address, three_byte: true
bjelli
  • 9,752
  • 4
  • 35
  • 50
0

In MySQL, the Winking Face (and most other Emoji) needs utf8mb4 instead of utf8.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • yes, thank you! but as I said: I am wary of converting the whole database. it's in production, there is a lot of data. I'd rather not change something so fundamental. – bjelli Mar 08 '22 at 20:34