0

I'm having an issue with getting the correct character encoding for data being POSTed which is built up from multiple sources (I get the data as a single POST variable). I think they're not in the same character encoding...

For instance, take the symbol £. If I do nothing to the character encoding I get two results:

a = £ and b = £

I've tried using various configurations of iconv() like so;

$data = iconv('UTF-8', 'windows-1252//TRANSLIT', $_POST['data']);

The above results in a = £ and b = �

I've also tried utf8_encode/decode as well as html_entity_decode, as I think there's a possibility that one of the pound symbols are being generated using html_entities.

I've tried setting the character encoding in the header which didn't work. I just can't get both instances to work at the same time.

I'm not sure what to try next, any ideas?

Novocaine
  • 4,692
  • 4
  • 44
  • 66
  • How is the `$_POST` data made? Is it possible to include the encoding of each part? – Halcyon Nov 10 '14 at 17:24
  • The post data is all coming from a database and is then collated into a single variable. This could technically be rewritten to set the encoding right for each part - however that's a far bigger task than I'd like. If it has to be done, it has to be done, but hopefully that's avoidable. – Novocaine Nov 10 '14 at 17:27
  • what is the database tables encoding? – Aris Nov 10 '14 at 18:16
  • There are quite a lot of tables, some are `utf8_general_ci` others are `latin1_swedish_ci` – Novocaine Nov 11 '14 at 10:18
  • 1
    I'd suggest you read [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) and [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) in addition to the linked duplicate. You really do not want to have to deal with converting encodings at all in your system. – deceze Nov 11 '14 at 15:18
  • Thanks for the links, I'll have a read. Just to note; the fact this data is not in a uniform encoding is unfortunately an inherited problem, not one I caused myself. I'm fully aware of the problems that can arise due to this ;) – Novocaine Nov 11 '14 at 15:27

1 Answers1

0

I've managed to work around this issue by finding the content that was causing an issue when everything else was in utf8 by using utf8_encode().

This appears to work for the £ symbol. I've not found any other characters causing an issue so far.

Note, I am still using iconv() in conjunction with this.

Novocaine
  • 4,692
  • 4
  • 44
  • 66