1

I am trying to pull data from a table and output it as a text (RTF) file. The problem is that there are some characters in the content that get mangled. For instance, if I have Spanish content, some of the characters are not recognized and get changed. For example, if I have:

'implementación'

the word gets changed to:

'implementación'

By using break points, I can see that the string coming from the database is correct, it's only when it gets printed out that the tilde get's changed. Below is my code:

           header("Content-Type: application/rtf; charset=utf-8;");
           header("Cache-Control: public");
           header("Content-Description: File Transfer");
           header("Content-Disposition: attachment; filename=".$fileName .".rtf");
           header("Content-Transfer-Encoding: binary");

           echo $content;

Thanks for your help.

jason

jason
  • 3,821
  • 10
  • 63
  • 120

2 Answers2

1

Match the output character set with the table's character set or convert the character set from the table with the character set you want to output.

Assuming the table uses US-ASCII to store data and we want to output it as UTF-8.

$content = iconv( 'US-ASCII', 'UTF-8//IGNORE//TRANSLIT', $content );
echo $content;

This will transliterate certain characters EG: € to EUR, and ignore/drop characters that are not known to the output character set.

If you are using Latin-1-General encoding in the table try CP850 (AKA: Code Page 850, MSDOS Latin-1) as opposed to US-ASCII.

http://us2.php.net/manual/en/function.iconv.php

You can optionally cast your encoding from within your query to the table For example with mysql

SELECT convert(cast(convert(content using  latin1) as binary) using utf8) AS content

MySQL - Convert latin1 characters on a UTF8 table into UTF8

This is useful if the data sent to the database was using a different character set than the table. For example sending ASCII or ISO-8859-1 data to a table/column using UTF-8 collation.

To find out the tables character encoding try:

SHOW CREATE TABLE `tablename`;

or How do I see what character set a MySQL database / table / column is?

For table encoding:

SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,
       information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
  AND T.table_schema = "schemaname"
  AND T.table_name = "tablename";

For column encoding:

SELECT character_set_name FROM information_schema.`COLUMNS` C
WHERE table_schema = "schemaname"
  AND table_name = "tablename"
  AND column_name = "columnname";

Alternatively you can try changing the charset header in PHP to match the database table's output.

header("Content-Type: application/rtf; charset=ISO-8859-1;");
Community
  • 1
  • 1
Will B.
  • 17,883
  • 4
  • 67
  • 69
  • Thank you for your reply. I'm not sure I understand how the replace works. When I use your code, it just completely removes the offending letter. ''implementación' becomes ''implementacin' without the 'ó' – jason Feb 18 '14 at 14:54
  • oh, ok, that link helped to explain it. I'll see if I can get that to work. THanks! – jason Feb 18 '14 at 14:57
  • Added a way to do it within your query assuming mysql database table – Will B. Feb 18 '14 at 15:00
  • So, If I dont have access to the query (I cant make changes to it) the only way to fix this would be to use the iconv function? – jason Feb 18 '14 at 15:02
  • Yes you would need to convert it using PHP as opposed to SQL. there are several methods, however iconv normally has the most functionality. But you would need to know the table's character encoding to be able to convert from it to UTF-8 or set your charset=utf-8; header to the table's encoding type. – Will B. Feb 18 '14 at 15:04
  • so, my table appears to be utf8 (Collation: utf8 default collation). But, not all the data will be Spanish. There could be multiple langauges. How do I support them all? – jason Feb 18 '14 at 15:13
  • UTF-8 provides you with a set of characters that apply to multiple languages and is for the most part the most universally supported. If the table is UTF-8 it sounds like data was lost inputting the data to the table. (EG: submitting ACII data into the UTF-8 column). See http://www.utf8-chartable.de/ for a list of characters. – Will B. Feb 18 '14 at 15:24
0
  1. Check that your database text is defined to be UTF-8 (preferably, all text in the database should be the same enoding).
  2. Check that your page output is UTF-8 and not the default Latin-1/ISO-8859-1 (or other single byte encoding, such as Windows-1252).
  3. Go into phpMyAdmin and browse the table's data, to make sure the data was actually received and processed as UTF-8. You will need to check that the phpMyAdmin browse page is actually displaying in UTF-8.
  4. If the table/field is UTF-8, and the page is UTF-8, but you still get the two characters, it is very likely that a UTF-8 backup (.sql file) was improperly imported as Latin-1 rather than UTF-8, and the two bytes of ó were individually translated to UTF-8 multibyte characters. You have to remember to tell phpMyAdmin when you IMPORT an .sql file what the file is encoded in. This is difficult to clean up, especially if you now have a mixture of encodings in your database.
Phil Perry
  • 2,126
  • 14
  • 18