0

I have an encoding problem which is driving me crazy. My web support both language english and spanish. Some of my tables (generated by hibernate) have as collation utf8_general_ci, some others, and I don't know why, have latin1_swedish_ci. But which is shaking me is that when people use my Contacts form and put inside for example a word with a "ñ" my Spring Controller takes it and send me an email which is Ok (I mean it is having the ñ) before save the data on MySQL. But when I check which is saved in MySQL (and my Contacts table have a utf8_general_ci collation) inside appears some horrible symbols replacing the "ñ" character, like Ãlvaro Núñez Cabeza de Váca. Resuming, jsp pages have UTF-8 declared, table is utf8_general_ci, hbn have utf-8 declared too:

# hibernate props
hibernate.dialect=org.hibernate.dialect.MySQLDialect
hibernate.show.sql=true
hibernate.hbm2ddl.auto=update
hibernate.format_sql=true
# hibernate props added to fix 4bytes encoded characters
hibernate.connection.CharSet=utf8mb4
hibernate.connection.characterEncoding=utf8
hibernate.connection.useUnicode=true

But all together is not working as expected.

Any help will be very welcomed.

SOLUTION: At least for me the only working fix was to add a filter in my web.xml. I am pretty sure there are better ways to solve encoding problems in a more elegant way, but in my case everything was configured to use UTF-8 and parto of my forms works well but others show Álvar Núñez Cabeza de Vaca as Ãlvar Núñez Cabeza de Vaca. The filter is:

<filter>  
    <filter-name>encodingFilter</filter-name>  
    <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>  
    <init-param>  
       <param-name>encoding</param-name>  
       <param-value>UTF-8</param-value>  
    </init-param>  
    <init-param>  
       <param-name>forceEncoding</param-name>  
       <param-value>true</param-value>  
    </init-param>  
</filter>  
<filter-mapping>  
    <filter-name>encodingFilter</filter-name>  
    <url-pattern>/*</url-pattern>  
</filter-mapping> 

As it is in this post Spring MVC UTF-8 Encoding

lm2a
  • 835
  • 1
  • 10
  • 19

1 Answers1

0

Sounds like there may be multiple problems going on. So, plan on making multiple fixes.

  • Use UTF-8 throughout. (Latin1 would work for English + Spanish, but the industry is moving away from that.)
  • Older versions of MySQL defaulted to latin1 character set (and latin1_swedish_ci collation).
  • Use CHARACTER SET utf8mb4 (not utf8) for MySQL. That is equivalent to UTF-8 in the outside world. In that snippet of config, change characterEncoding=utf8 to characterEncoding=UTF-8.
  • You mentioned utf8_general_ci -- change to utf8mb4... and consider utf8mb4_unicode_520_ci (best overall) or utf8mb4_spanish_ci or utf8mb4_spanish2_ci.
  • Núñez is Mojibake for Núñez. This happens when part of the system is talking latin1 and another part is talking UTF-8.
  • If you need to recover the messed up data (due to Mojibake), we can discuss that. It is preferrable to start over, using UTF-8/utf8mb4 throughout.

More on diagnosing Mojibake, etc: Trouble with UTF-8 characters; what I see is not what I stored

For the collation differences: http://mysql.rjweb.org/utf8_collations.html -- mostly related to ch, ll, ñ -- Do you want those to be treated as 'separate letters'? (Should ch sort between cg and ci or after cz? Etc)

For Java/JDBC/Hibernate/etc: ?useUnicode=true&characterEncoding=UTF-8

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Hi Rick, many thanks, I was trying your suggestions and move my problematic Table to utf8mb4 but still failing, curiously then I observed in the same database I have another table USER which is encoded with latin1 and collate latin1_swedish_ci which is working well with spanish characters. Then I try moviln my other table from utf8 to latin1 but it is still failing, then I move to utf8mb4_spanish_ci; ALTER TABLE Contacts CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_spanish_ci; so, now I am completly lost. – lm2a Jun 08 '19 at 11:31
  • @lm2a - It's not just the table; it is also the `SET NAMES` or connection parameters. _They_ tell mysql what encoding is used in the _client_. That is usually where the errors come. – Rick James Jun 08 '19 at 12:59
  • Thanks Rick, but being both tables are using the same connector. Here you have the parameters: jdbc:mysql://my_dabase_ip:3306/mydatabasename?useUnicode=true&character_set_server=utf8mb4&createDatabaseIfNotExist=true&autoReconnect=true&connectTimeout=60000&socketTimeout=60000. I am using MySQL 5.7, but it is a Maria instance, on Google Cloud. And I added a mark: character_set_server = utf8mb4 About SET NAMES, perhaps this is done by Hibernate, but I never did it. But how it is possible in the same context one is working and the other one not? – lm2a Jun 08 '19 at 15:49
  • Perhaps this matters, I found the object which is saved correctly is saved with a DAO which use HibernateTemplate, and the object which is not working well is saved using SessionFactory (from Hibernate too). AFAIK SessionFactory is the recommended Spring option, but here seems as HibernateTemplate is doing better. – lm2a Jun 08 '19 at 16:13
  • And furthermore, seems as it is not a MySQL problem, because I am able to input a name like Álvar Núñez Cabeza de Vaca using MySQLWorkbench, and it is persisted fine. So, perhaps the problem be on the way my Hibernate configuration is doing it. – lm2a Jun 08 '19 at 17:05
  • @lm2a - I added to my Answer – Rick James Jun 08 '19 at 18:33