-1

I have huge problems with encodings. I'm scraping text from some other sites with file_get_contents(). And the quotes becomes special odd characters or questionmarks. But the strange thing is that some text from different sites ARE utf-8, but the quotes becomes different things when I receive it. When I run utf8_decode() a quote from one utf-8 text becomes a quote. Bot in another utf-8 text from another site it becomes a questionmark.

Is there any way to fix so all text is looking good when I save it to db.

The charset in database table is latin1_swedish_ci, and I have tried to change it to utf8_unicode_ci but did no difference.

Edit:

Have now tried a little bit more. These two works for different texts. This one works for one text:

$source = utf8_encode($source);

And this are working for the others:

$source = mb_convert_encoding($source, 'HTML-ENTITIES', 'utf-8');

But you can't put the string through both. They are not working together. They destroy the other ones for each other.

Printscreen without any encoding (text is in Swedish):

enter image description here

Edit:

FYI: I have now changed the table to utf8_unicode_ci. However, still not working. Here are all the functions I've tried with:

enter image description here

Actually, if I just leave it like this, most of the texts are outputted with right characters. It's just some where " becomes ”.

Peter Westerlund
  • 741
  • 1
  • 10
  • 36

1 Answers1

0

can you please dump the code you grabbed using print_r?

notice: your html page must have a correct meta-charset to display unicode characters correctly.

<head>
    <meta charset="UTF-8">
</head>
GrafiCode
  • 3,307
  • 3
  • 26
  • 31
  • Does the page need right meta-charset even for text fields like ``? I'm printing them there. I add a printscreen in my post above now.. Everything is in Swedish. And content-type is UTF-8 yes. – Peter Westerlund Sep 18 '15 at 19:53
  • well yes, wherever you decide to output your data (even inside the attribute value of an inputbox), the meta charset should always be declared accordingly to the DB collation. – GrafiCode Sep 18 '15 at 20:00
  • I just want all the text going into DB to be the same type... Don't know what to do :/ – Peter Westerlund Sep 18 '15 at 20:05