0

I insert from csv characters from different languages..

I apply this to every set of characters:

    private function process_elements($element){
       utf8_encode($element);
      return $element;
}

The problem is when they go into the database, they go like this:

???????? ?? ???????????? ????? ??????? ??? ???????...

When I retrieve them from the databse, I also get this.

This happens with greek. However, when I retrieve greek pages (through scrapping), who are on a utf encoded page. The characters look like this:

Δες webcam δωμάτια | Gr.ImLive.com

which is okay, because when i use the utf8_encode function, they look normal on the screen..

But when the data is taken from the csv and be put into the database, i get those question marks..

Is there a way to encode form any language to utf.. why retrieving data from csv and a utf8 encoded webpage makes such a difference.. they look the same.. how do I address that problem?

Dmitry Makovetskiyd
  • 6,942
  • 32
  • 100
  • 160

2 Answers2

2

please take a look at this

it will help you

Handling Unicode Front To Back In A Web App

maxjackie
  • 22,386
  • 6
  • 29
  • 37
  • Also: [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) :) – deceze Jun 13 '12 at 07:57
  • does it talk about how to convert greek letters from csv file and then encoding them to utf.. cause that seems to be the problem.. I get those ??? characters whenever taking data and putting them into database..with the utf8_encode function and without it :( – Dmitry Makovetskiyd Jun 13 '12 at 07:58
  • in other words, i can fetch a utf document and translate it successfully, cause i do scrapping on google which uses utf.. so there is no problem with that..but when i get the data from a file, it has some encoding (it is an excel file)..the greek words look like this ??? – Dmitry Makovetskiyd Jun 13 '12 at 08:03
1

It's not about "languages", it's about encodings. Text is encoded as bits and bytes. Any one byte is equal to any other byte. If you only have a blob of bytes, you cannot know what encoding it represents. You can guess, but that's not accurate. You have to know what encoding some text is in by reading the accompanying meta data. That may be documentation, a <meta> tag or an HTTP header. Then you need to treat the text in that encoding.

utf8_encode actually converts text from ISO-8859-1 to UTF-8. It does not simply encode anything to UTF-8, because it does not have the means to determine what something is encoded in either. If your text is already UTF-8 encoded or was not ISO-8859-1 encoded to begin with, you're just garbling the text (as you are).

deceze
  • 510,633
  • 85
  • 743
  • 889
  • i know that.. it is correct.. the problem is really not in the utf8 encode.. but the greek data that i obtain from the csv file..Example: Δες webcam δωμάτια when i put it into the database, with or without any encoding i get this ??????.. how do i deal with that – Dmitry Makovetskiyd Jun 13 '12 at 08:01
  • What is the encoding of the CSV file? – deceze Jun 13 '12 at 08:09
  • i think i found the solution it is to encode the excel through google docs: http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding .. the solution is not a php solution – Dmitry Makovetskiyd Jun 13 '12 at 08:32
  • You can do it with PHP as well, you just need to figure out what encoding the text is encoded in, then convert it specifically from that encoding to UTF-8. – deceze Jun 13 '12 at 09:28