2

I have a tweet like this one: https://twitter.com/ATD_hashtagger/status/591204518253170689

#myownlittletest 3

When I try to add this to my database, it becomes this:

???????????????????????????????????????????????????????????????????????????????????????????????????????????????? #myownlittletest 1

My table has collation utf8mb4_general_ci, the same collation is also used for the field (TEXT) in which it is inserted. The value is inserted in the column named text.

I have a very simple query to insert it:

INSERT INTO `posts_twitter` (`hashtagId`, `coordinates`, `created`, `filterlevel`, `postId`, `language`, `profanity`, `retweeted`, `text`, `truncated`, `userId`, `username`, `userFullname`, `media`, `mediaType1`, `media1`, `mediaType2`, `media2`, `mediaType3`, `media3`, `mediaType4`, `media4`) VALUES (:hashtagId, POINT(:coordinatesLat,:coordinatesLon), :created, :filterlevel, :postId, :language, :profanity, :retweeted, :text, :truncated, :userId, :username, :userFullname, COALESCE(:media1,:media2,:media3,:media4), :mediaType1, :media1, :mediaType2, :media2, :mediaType3, :media3, :mediaType4, :media4) ON DUPLICATE KEY UPDATE `hashtagId` = `hashtagId`

The params are binded by PDO:

array(22) {
  ["coordinatesLat"]=>
  NULL
  ["coordinatesLon"]=>
  NULL
  ["created"]=>
  int(1429788536)
  ["filterlevel"]=>
  string(3) "low"
  ["postId"]=>
  int(591202087243014145)
  ["language"]=>
  string(3) "und"
  ["profanity"]=>
  bool(false)
  ["retweeted"]=>
  bool(false)
  ["text"]=>
  string(131) " #myownlittletest 1"
  ["truncated"]=>
  bool(false)
  ["userId"]=>
  int(2805078403)
  ["username"]=>
  string(14) "ATD_hashtagger"
  ["userFullname"]=>
  string(18) "Hashtag aggregator"
  ["media1"]=>
  NULL
  ["mediaType1"]=>
  NULL
  ["media2"]=>
  NULL
  ["mediaType2"]=>
  NULL
  ["media3"]=>
  NULL
  ["mediaType3"]=>
  NULL
  ["media4"]=>
  NULL
  ["mediaType4"]=>
  NULL
  ["hashtagId"]=>
  int(3)
}

My table is created using the following query:

CREATE TABLE IF NOT EXISTS `local_sysDB`.`posts_twitter` (
  `hashtagId` INT NOT NULL,
  `postId` CHAR(20) NOT NULL,
  `coordinates` POINT NULL,
  `created` INT(20) UNSIGNED NOT NULL,
  `filterlevel` VARCHAR(45) NOT NULL,
  `language` CHAR(11) NOT NULL,
  `profanity` TINYINT(1) NOT NULL,
  `retweeted` TINYINT(1) NOT NULL,
  `text` CHAR(240) NOT NULL,
  `truncated` TINYINT(1) NOT NULL,
  `userId` INT UNSIGNED NOT NULL,
  `username` CHAR(15) NOT NULL,
  `userFullname` CHAR(20) NOT NULL,
  `media` VARCHAR(255) NULL COMMENT 'First encountered media of COALESCE',
  `mediaType1` CHAR(5) NULL,
  `media1` VARCHAR(255) NULL,
  `mediaType2` CHAR(5) NULL,
  `media2` VARCHAR(255) NULL,
  `mediaType3` CHAR(5) NULL,
  `media3` VARCHAR(255) NULL,
  `mediaType4` CHAR(5) NULL,
  `media4` VARCHAR(255) NULL,
  PRIMARY KEY (`hashtagId`, `postId`),
  INDEX `posts_coordinates` (`coordinates` ASC)  COMMENT 'should be SPATIAL INDEX',
  FULLTEXT INDEX `posts_twitter_search` (`text` ASC, `username` ASC, `userFullname` ASC),
  CONSTRAINT `posts_twitter_posts`
    FOREIGN KEY (`hashtagId` , `postId`)
    REFERENCES `local_sysDB`.`posts` (`hashtagId` , `postId`)
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

This query results in:

Name           Type         Collation           Attributes  Null    Default
hashtagId      int(11)                                      No      None    
postId         char(20)     utf8mb4_general_ci              No      None    
coordinates    point                                        Yes     NULL    
created        int(20)                          UNSIGNED    No      None    
filterlevel    varchar(45)  utf8mb4_general_ci              No      None    
language       char(11)     utf8mb4_general_ci              No      None    
profanity      tinyint(1)                                   No      None    
retweeted      tinyint(1)                                   No      None    
text           char(240)    utf8mb4_general_ci              No      None    
truncated      tinyint(1)                                   No      None    
userId         int(10)                          UNSIGNED    No      None    
username       char(15)     utf8mb4_general_ci              No      None    
userFullname   char(20)     utf8mb4_general_ci              No      None    
media          varchar(255) utf8mb4_general_ci              Yes     NULL    
mediaType1     char(5)      utf8mb4_general_ci              Yes     NULL    
media1         varchar(255) utf8mb4_general_ci              Yes     NULL    
mediaType2     char(5)      utf8mb4_general_ci              Yes     NULL    
media2         varchar(255) utf8mb4_general_ci              Yes     NULL    
mediaType3     char(5)      utf8mb4_general_ci              Yes     NULL    
media3         varchar(255) utf8mb4_general_ci              Yes     NULL    
mediaType4     char(5)      utf8mb4_general_ci              Yes     NULL    
media4         varchar(255) utf8mb4_general_ci              Yes     NULL

When I run this query: SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

It gives me this result in phpMyAdmin:

Variable_name              Value
character_set_client       utf8
character_set_connection   utf8
character_set_database     utf8mb4
character_set_filesystem   binary
character_set_results      utf8
character_set_server       latin1
character_set_system       utf8
collation_connection       utf8_general_ci
collation_database         utf8mb4_general_ci
collation_server           latin1_swedish_ci

I use the following MySQL version:

Server: Localhost via UNIX socket

Server type: MySQL Server version: 5.6.24-log - MySQL Community Server (GPL)

Protocol version: 10

Server charset: UTF-8 Unicode (utf8)

I've already found out that using a BLOB field could help, but then I can't use a FULLTEXT index on the field text.

Is there a way to save emoji's to my database without transforming them into question marks?

This is not a duplicate, because my characters are not urning blank, but in question marks, also, solutions in the provided topic are not working.

  • 1
    change the colation to symply utf8_general_ci – NIRANJAN S. Apr 23 '15 at 12:17
  • 1
    I had utf8_general_ci before, but that didn't display emoji's as well –  Apr 23 '15 at 12:29
  • What version of MySQL are you using? The following [SQL Fiddle](http://sqlfiddle.com/#!9/9cf8a/1) seems it does the job. – wchiquito Apr 23 '15 at 13:12
  • 1
    @wchiquito, I've updated my question with the information you requested. –  Apr 23 '15 at 13:20
  • 1
    @wchiquito, I've tried to create a fiddle using the create table statement in my question: http://sqlfiddle.com/#!9/955d9. I've commented out the inserts because they generate an error. I'm not getting this error on my server though. –  Apr 23 '15 at 13:43
  • You need to use `utf8mb4` [**all the way through**](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through). You're using a plain `utf8` connection it appears. – deceze Apr 23 '15 at 14:12
  • 1
    I've added some settings into my.cnf from this website https://mathiasbynens.be/notes/mysql-utf8mb4 (step 5) –  Apr 23 '15 at 14:13
  • Add the `CHARACTER SET` and `COLLATE` to the column `text`. [SQL Fiddle](http://sqlfiddle.com/#!9/672c5/1). – wchiquito Apr 23 '15 at 15:22
  • 1
    @wchiquito, I'm using Mysql Workbench to generate the SQL code, any idea how I can let MySQL Workbench generate that into the output? –  Apr 24 '15 at 05:42

0 Answers0