67

I've tried converting the text to or from utf8, which didn't seem to help.

I'm getting:

"It’s Getting the Best of Me"

It should be:

"It’s Getting the Best of Me"

I'm getting this data from this url.

Michael
  • 8,362
  • 6
  • 61
  • 88
Mint
  • 14,388
  • 30
  • 76
  • 108

16 Answers16

92

To convert to HTML entities:

<?php
  echo mb_convert_encoding(
    file_get_contents('http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20x02&exact=0'),
    "HTML-ENTITIES",
    "UTF-8"
  );
?>

See docs for mb_convert_encoding for more encoding options.

Community
  • 1
  • 1
Matthew
  • 47,584
  • 11
  • 86
  • 98
  • That works, though I can't figure out to get it to work on fopen – Mint Feb 19 '10 at 04:11
  • 7
    Once you get the contents of the file you want, then pass it in as the first parameter to `mb_convert_encoding()`. e.g., `$text = fgets($fp); $html = mb_convert_encoding($text, "HTML-ENTITIES", "UTF-8");` – Matthew Feb 19 '10 at 04:46
  • domain is not valid anymore. – mtness Jun 05 '14 at 09:39
  • What about in a URL where the html entity wouldn't make a valid URL for something like an RSS feed. – Titan Apr 22 '15 at 08:33
  • @GreenGiant: My answer simply shows you how to convert from one encoding to another. URLs (excluding domains) can include Unicode characters; at least modern browsers know how to display them. e.g., this is a valid URL: http://en.wikipedia.org/wiki/. (Although SO is eating the slash after wiki.) So UTF-8 is generally an acceptable encoding for URLs. But if you wanted to avoid that, you could try using 'ASCII' for the second parameter. It obviously won't support as many characters though, so you may end up with '?' placeholders. – Matthew Apr 23 '15 at 01:06
33

Make sure your html header specifies utf8

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

That usually does the trick for me (obviously if the content IS utf8).

You don't need to convert to html entities if you set the content-type.

Ben
  • 20,737
  • 12
  • 71
  • 115
  • This has got to be the greatest post ever! I updated my charset to utf-8 and it instantly fixed all my database driven pages. Thanks for that awesomely quick fix! – Jamie Apr 03 '13 at 19:30
  • This should be accepted as the answer because it's a global solution. – Keith Petrillo Oct 24 '18 at 14:25
12

Your content is fine; the problem is with the headers the server is sending:

Connection:Keep-Alive
Content-Length:502
Content-Type:text/html
Date:Thu, 18 Feb 2010 20:45:32 GMT
Keep-Alive:timeout=1, max=25
Server:Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.7 with Suhosin-Patch
X-Powered-By:PHP/5.2.4-2ubuntu5.7

Content-Type should be set to Content-type: text/plain; charset=utf-8, because this page is not HTML and uses the utf-8 encoding. Chromium on Mac guesses ISO-8859-1 and displays the characters you're describing.

If you are not in control of the site, specify the encoding as UTF-8 to whatever function you use to retrieve the content. I'm not familiar enough with PHP to know how exactly.

Michael
  • 8,362
  • 6
  • 61
  • 88
cobbal
  • 69,903
  • 20
  • 143
  • 156
10

I know the question was answered but setting meta tag didn't help in my case and selected answer was not clear enough, so I wanted to provide simpler answer.

So to keep it simple, store string into a variable and process that like this

$TVrageGiberish = "It’s Getting the Best of Me";

$notGiberish = mb_convert_encoding($TVrageGiberish, "HTML-ENTITIES", 'UTF-8');

echo $notGiberish;

Which should return what you wanted It’s Getting the Best of Me

If you are parsing something, you can perform conversion while assigning values to a variable like this, where $TVrage is array with all the values, XML in this example from a feed that has tag "Title" which may contain special characters such as ‘ or ’.

$cleanedTitle = mb_convert_encoding($TVrage->title, "HTML-ENTITIES", 'UTF-8');
Tumharyyaaden
  • 2,733
  • 3
  • 19
  • 20
4

If you're here because you're experiencing issues with junk characters in your WordPress site, try this:

  1. Open wp-config.php

  2. Comment out define('DB_CHARSET', 'utf8') and define('DB_COLLATE', '')

    /** MySQL hostname */
    define('DB_HOST', 'localhost');
    
    /** Database Charset to use in creating database tables. */
    //define('DB_CHARSET', 'utf8');
    
    /** The Database Collate type. Don't change this if in doubt. */
    //define('DB_COLLATE', '');
    
Michael
  • 8,362
  • 6
  • 61
  • 88
questCorp
  • 49
  • 1
  • 1
  • Don't do this on any live site - it has a high probability of blowing out any existing theme-related options. This answer also does not solve the issue. – Howdy_McGee Jun 30 '22 at 15:09
4

Just try this

if $text contains strange charaters do this:

$mytext = mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8');

and you are done..

semirturgay
  • 4,151
  • 3
  • 30
  • 50
3

We had success going the other direction using this:

mb_convert_encoding($text, "HTML-ENTITIES", "ISO-8859-1");
3

It sounds like you're using standard string functions on a UTF8 characters (’) that doesn't exist in ISO 8859-1. Check that you are using Unicode compatible PHP settings and functions. See also the multibyte string functions.

pr1001
  • 21,727
  • 17
  • 79
  • 125
3

if all seems not to work, this could be your best solution.

<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "&#39;", $content);
echo $content;
?>

==or==

<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
ShapCyber
  • 3,382
  • 2
  • 21
  • 27
1

try this :

html_entity_decode(mb_convert_encoding(stripslashes($text), "HTML-ENTITIES", 'UTF-8'))
Softmixt
  • 1,658
  • 20
  • 20
1

For fopen and file_put_contents, this will work:

str_replace("&rsquo;", "'", htmlspecialchars_decode(mb_convert_encoding($string_to_be_fixed, "HTML-ENTITIES", "UTF-8")));
Rehmat
  • 4,681
  • 3
  • 22
  • 38
1

You Should check encode encoding origin then try to convert to correct encode type.

In my case, I read csv files then import to db. Some files displays well some not. I check encoding and see that file with encoding ASCII displays well, other file with UTF-8 is broken. So I use following code to convert encoding:

if(mb_detect_encoding($content) == 'UTF-8') {
    $content = iconv("UTF-8", "ASCII//TRANSLIT", $content);
    file_put_contents($file_path, $content);
} else {
    $content = mb_convert_encoding($content, 'UTF-8', 'UTF-8');
    file_put_contents($file_path, $content);
}

After convert I push the content to file then process import to DB, now it displays well in front-end

V.Tran
  • 97
  • 5
  • I am having †instead of an apostrophe in Gmail title. And ASCII fix the problem. BTW, 1. charset is already set to UTF-8 and it did not work. 2. mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8'); turned out to be "’" displayed in title. So this answer is the only fix for my case. – Randy Lam Sep 02 '21 at 03:13
1

If none of the above solutions work:

In my case I noticed that the single quote was a different style of single quote. Instead of ' my data had a ’. Notice the difference in the single quote? So I simply wrote a str_replace to replace it and it fixed the problem. Probably not the most elegant solution but it got the job done.

$string= str_replace("’","'",$string);
crokadilekyle
  • 80
  • 1
  • 8
0

use this

<meta http-equiv="Content-Type" content="text/html; charset=utf8_unicode_ci" />

instead of this

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Fabian Schmengler
  • 24,155
  • 9
  • 79
  • 111
karuppub
  • 81
  • 1
  • 6
0

I looked at the link, and it looks like UTF-8 to me. i.e., in Firefox, if you pick View, Character Encoding, UTF-8, it will appear correctly.

So, you just need to figure out how to get your PHP code to process that as UTF-8. Good luck!

C. K. Young
  • 219,335
  • 46
  • 382
  • 435
0

If nothing works try this mb_convert_encoding($elem->textContent, 'UTF-8', 'utf8mb4');

Mukta Chowdhary
  • 184
  • 1
  • 8