This one ended up being much more of a pain to solve than it sounds as though it should have been. Essentially the data was corrupting through various processes where there wasn't a clearly defined encoding environment. Essentially the data was double-encoding as it was parsed so that “ became “ and then became “
This thread helped enormously with understanding the environmental issues and how to make sure PHP, MySQL and my coding environment were all using UTF-8 as its defaults.
This site was great with testing and fixing the mysql environment. The code from that page worked well on the server (accessed and changed using SSL). I use a WAMP server locally and the code didn't work well there. I ended up making the following changes to my my.ini file
[client]
...
default-character-set=utf8
[wampmysqld]
...
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'
Then I needed to fix the data that was there.
I was going to just replace the offending code with safe single-byte characters (“ would be ") but in the end I decided to just make sure the multi-byte characters were corrected so that they displayed when viewed as UTF-8.
I didn't want to trust a script to encode the search and replace strings and I found a great starting point here. I used MySQL's UNHEX function to control this aspect:
-- Single quotes
UPDATE table SET column = REPLACE(column, UNHEX('C3A2E282ACCB9C'), UNHEX('E28098'));
UPDATE table SET column = REPLACE(column, UNHEX('C3A2E282ACE284A2'), UNHEX('E28099'));
-- Double quotes
UPDATE table SET column = REPLACE(column, UNHEX('C3A2E282ACC593'), UNHEX('E2809C'));
UPDATE table SET column = REPLACE(column, UNHEX('C3A2E282ACC29D'), UNHEX('E2809D'));
-- Em and En dashes
UPDATE table SET column = REPLACE(column, UNHEX('C3A2E282ACE2809C'), UNHEX('E28094'));
UPDATE table SET column = REPLACE(column, UNHEX('C3A2E282ACE2809D'), UNHEX('E28094'));
To make sure everything was working, I took a copy of the table and then replaced all of the changed multi-byte characters to single characters
UPDATE table SET column = REPLACE(column, UNHEX('E28098'), "'");
UPDATE table SET column = REPLACE(column, UNHEX('E28099'), "'");
UPDATE table SET column = REPLACE(column, UNHEX('E2809C'), '"');
UPDATE table SET column = REPLACE(column, UNHEX('E2809D'), '"');
UPDATE table SET column = REPLACE(column, UNHEX('E28093'), '-');
UPDATE table SET column = REPLACE(column, UNHEX('E28094'), '-');
and then ran this MySQL script to capture any records that still contained multi-byte characters:
SELECT * FROM table WHERE LENGTH(column) != CHAR_LENGTH(column);
This was I was able to quickly hunt down the anomalies in the existing data.
Hope this helps anyone else trying to transfer data across different systems.