0

I am migrating an app from php to rails, and am facing some issues in the display of em-dash. I am displaying a field which according to phpmyadmin and rails console, has the value of "Mon,Tue & Thu: 8 a.m. – 12 a.m." where the – is supposed to be an em-dash (the long dash). Not sure why it is stored this way to begin with..

In php I display with <td><b>Opening Hours</b><br><?= nl2br($bar['opening_hours']) ?></td> and this renders to Mon,Tue & Thu: 8 a.m. – 12 a.m.

In rails I display this with = simple_format(@venue.opening_hours, style: "margin-bottom: 0px;") in slim. This however just renders to Mon,Tue & Thu: 8 a.m. – 12 a.m.

Does anyone have any idea why this happens to begin with and how come php overcomes it? I tried echo nl2br("Mon,Tue & Thu: 8 a.m. – 12 a.m."); on http://phpepl.cloudcontrolled.com/ and it just printed it as is..

edit: outputing to error_log gets me Mon,Tue & Thu: 8 a.m. \xe2\x80\x93 12 a.m.

Karthik T
  • 31,456
  • 5
  • 68
  • 87
  • 1
    getting `â` and the like means you have a charset issue. e.g. iso8859 in one place and utf-8 in the other. You have to maintain the SAME character set throughout your entire system, or hook the stages together with charset translation logic. – Marc B Jul 30 '14 at 16:59
  • @MarcB It occurred to me that might be where the problem lies, but I am not sure whr to dig deeper.. do you have any suggestions how to go about fixing this? Is it related to http://stackoverflow.com/questions/6769901/why-is-mysqls-default-collation-latin1-swedish-ci ? Can i fix it with http://stackoverflow.com/questions/6115612/how-to-convert-an-entire-mysql-database-characterset-and-collation-to-utf-8 ? How is it that the php code appears to work? My php codebase doesnt at any point seem to specify the encoding of the database itself as far as I can see. – Karthik T Jul 31 '14 at 03:39
  • @MarcB following http://stackoverflow.com/questions/4773488/change-default-charset I discovered that my db is using `latin1`. Will try to convert this to utf8 on dev and see how that goes. – Karthik T Jul 31 '14 at 03:56
  • my `php.ini` has `;default_charset = "iso-8859-1"`. Could it be that if unspecified, `latin1` is the default, so it works well with the database? – Karthik T Jul 31 '14 at 04:13
  • doesn't matter what PHP's settings are if the db connection and/or db tables are set to something else. – Marc B Jul 31 '14 at 14:12

1 Answers1

0

TL;DR - The db data encoding was latin1, my rails expected utf-8. Used this script to convert -> profit!


Long version: After @MarcBs comment, I took a look at my phpmyadmin, and confirmed that indeed my tables charset is set as latin1 and collation as latin1_swedish_ci. And it also appeared that either php's mysql is detecting it, or that is the default in the plugin.

To verify, I manually set the encoding expected by php to utf-8, and presto, the display looked exactly as my rails app did. However, oddly, the reverse didnt work. When I set the rails db encoding to latin1, the character changed, but not to the right version. Regardless that appeared to be the problem.

To convert the tables and the data to utf-8, I first tried the solutions presented at How to convert an entire MySQL database characterset and collation to UTF-8?. They did not work for me. There was no change on the front end.

Finally after lot of troubleshooting and searching, I came across this script which appeared to do what I needed. I ran it against a copy of the production db, and it worked! It was only after that that I went through it to understand what it was doing.. It basically converts the data to binary, and then back again into the new encoding (utf-8) on top of just changing the table configuration.

Through this process, my old data escaped intact, but some of my new data that I had recently imported got ruined, because they were the reverse of my original case. They were imported by a rails script (utf-8) into the latin1 database, which meant it looked fine in rails but messed up in php. But this was a small case, and I just cleared the data and imported it again.

Community
  • 1
  • 1
Karthik T
  • 31,456
  • 5
  • 68
  • 87