We use a PHP script to read data from an API. For each returned record, we compare a field from that record with data in a MSSQL database. Different logic is implemented depending on whether or not these two fields (strings) match.
Matches are case-insensitive and we also do not care about trailing/leading white space.
//value of $p['organizationName'] = 'Forté Initiative'
$groupNameForMatching = trim(strtolower($p['organizationName']));
//there is a group name in this table = 'Forté Initiative'
$matchCountPrep=$conn->prepare("SELECT COUNT(*) AS MatchCount FROM schema.table WHERE LOWER(LTRIM(RTRIM(GroupName))) = :OrganizationName);
$matchCountPrep->bindParam(":OrganizationName", $groupNameForMatching);
try {
$matchCountPrep->execute();
}
catch (PDOException $matchCountError) {
echo "ERROR: " . $matchCountError->getMessage();
exit();
}
$matchCount=$matchCountPrep->fetchColumn();
This code works as expected for all groups which do not contain special characters. For example:
"Chess club " and "Chess Club" match and the value of $matchCount = 1 for this group.
However "Forté Initiative" and "Forté Initiative" somehow do not match, even though they are identical strings. I believe this is probably due to the character encoding used for the character "é".
I do not have administrative nor dba access to the database server or instance. I do have administrator access to the web server and PHP configuration executing the script. However, this script will be migrated to another web server which I do not have administrative access to in the near future.
For the above reasons, I am looking for an in-code solution. Is there a way to force like-encoding on the two strings while making the comparison in the filter clause of the query?
Thank you for any suggestions. This is my first posting on Stack Overflow so please let me know if my question is lacking in any way and I will update. I appreciate your help!
EDIT: I received guidance on another site which led me to the problem with my code and two solutions.
The problem is that strtolower is not a multi-byte safe function and will not preserve native encoding. The accented character's encoding is broken when I use it. Additionally I was operating under a false assumption that the string comparisons would be case sensitive. The entire need for using strtolower does not actually exist.
Solution one: use mb_strtolower instead of strtolower. mb_strtolower successfully preserves the encoding and the MSSQL comparison behaves as expected when this function is used.
Solution two: compare the strings after trimming the white space but without forcing either side of the comparison to lower case. The filter clause of the query will still evaluate the strings as matching even if they do not match strictly by case. NOTE: this is true by default, but may not be true in all environments depending on the collation of your database.