As part of a project, we needed to move from Ubuntu 14.04 to Ubuntu 16.04. However, since the upgrade was completed, full functionality has not been working correctly. The encoding of the characters is being jumbled when stored in the database. The same debian version of the software produces different results, implying an ISO issue with a different library or some differences in Java behaviour.
The upgraded server is experiencing no problems and it persists only on newer installs, which implies an issue at the ISO level, but there is no obvious sign as to which library or similar may have failed to install.
Logging was added to print the bytes received, and Java still reads this as it would be expected. However, when it stores them in the database, they are completely different. This is done via a JPA connection setup earlier. This is already using the 'useUnicode=true&characterEncoding=UTF-8' field. When Java reads this data again, it still thinks it is using the correct bytes when it is not. Likewise, if you add something directly to the DB, Java's debugging logs do not show the correct bytes, yet the information is still shown correctly when displayed via the interface which could only have passed through here. This implies the issue is with storing the data rather than handling of it, but the same version of the debian install affects both versions. The working version reads the bytes correctly when it gets them out of the database.
شلاؤ, in Arabic for example is supposed to be encoded as (by using hex function in mysql/mariadb), comes out, in the correct version as "D8B4D984D8A7D8A4" BUT in the incorrect version, displays as "C398C2B4C399C284C398C2A7C398C2A4". This may provide more information as to why the encoding is failing to work correctly. With Java reading the incorrect bytes as if they are correct, this is more likely to be an issue with Java, but the confusion remains due to the inconsistency between systems.