-1

I am learning curl to fetch data from a site. Everything works fine with Curl except for special characters. When I look at the source of the site it has following items.

<li class="page_item page-item"><a href="../categories/mens-health/">Men&#8217;s Health</a></li>
<li class="page_item page-item"><a href="../categories/nails-hair-skin/">Nails, Hair &#038; Skin</a></li>
<li class="page_item page-item"><a href="../categories/womens-health/">Women’s Health</a></li>  

When I get the data in array and echo it on browser I get the result as

Men&#8217;s Health  
Nails, Hair &#038; Skin  
Women’s Health

which I got by executing the following code

$search = array('&#146;');
$replace = array("'");  
$category_names[] = htmlentities(str_replace($search, $replace, $word), ENT_QUOTES);

$word being the 3 array items above. Now I am not able to convert them to proper characters while inserting into database. This is how it appears in my db

Men&amp;#8217;s Health
Nails, Hair &amp;#038; Skin
Women&rsquo;s Health

How can I insert it in proper format as follows?
Men's health
Nails. Hair & Skin
Women's Health

I checked some of the solutions for having apostrophe but they are mostly single insert statements, where as I am inserting in a loop.

Way to insert text having ' (apostrophe) into a SQL table
How do I escape a single quote in SQL Server?

I did html_entity_decode($category_names[$i]); and now I get the following reult in my database
Men’s Health
Nails, Hair & Skin
Women’s Health

Community
  • 1
  • 1
User56756
  • 352
  • 4
  • 19
  • http://php.net/manual/en/function.html-entity-decode.php ? – Kita Oct 01 '15 at 04:56
  • @Kita I have used html entities as shown in my question htmlentities(str_replace($search, $replace, $word), ENT_QUOTES); . Is that what you are trying to tell me?? Can you please elaborate – User56756 Oct 01 '15 at 05:00
  • @Shilekha `html_entity_decode` does the opposite of `htmlentities`. I've elaborated in my answer. – Kita Oct 01 '15 at 05:05

3 Answers3

2

html_entity_decode will decode HTML entities, including NCRs. For example, &#8217; will become .

<?php
$in = 'Men&#8217;s Health  
Nails, Hair &#038; Skin  
Women’s Health';

echo html_entity_decode($in);

will print

Men’s Health  
Nails, Hair & Skin  
Women’s Health

The code above is hosted here: http://ideone.com/1rWL45

EDIT

Your DB table might be in Latin1 and inserting Unicode (eg. ) characters into it will result in such mangled characters. Simply replacing a few Unicode characters to ASCII may mitigate certain part of your encoding problem. However, I recommend altering table's character set to UTF-8.

<?php

$map = [ '’' => "'", "..." => "..." ]; // from->to pairs
$normalized = str_replace(array_keys($map), array_values($map), $string);
Kita
  • 2,604
  • 19
  • 25
  • Thank you, that fixed the ampersand. So now my database has entries as Men’s Health Nails, Hair & Skin Women’s Health – User56756 Oct 01 '15 at 05:05
  • @Shrilekha broken characters in DB is usually caused by character encoding settings. See if both your PHP and DB is using UTF-8. Otherwise (it's bit beyond the question's scope) try to make them use the same character set. – Kita Oct 01 '15 at 05:10
  • Ok I will check that and find out what really went wrong. – User56756 Oct 01 '15 at 05:13
  • Here is a SO answer regarding your problem http://stackoverflow.com/a/3854705/760211 – Kita Oct 01 '15 at 05:14
  • I did what was mentioned in the link you gave here. My character set was in fact in latin-1. So I changed it to utf8 as guided in that post by Ashley Wlliams. The problems seems to persists. So I might keep looking for somt other method. But I appreciate your quick response. – User56756 Oct 01 '15 at 05:34
  • @Shrilekha Seems my answer solves your original question. however, you've modified it to expand to another question. why not accept my answer as to your original question's context? – Kita Oct 05 '15 at 02:07
0

may be .html and .text function can help you for example:

html

<div id="test">&lt;&lt;</div>

jquery

var t = $('#test');
t.html(t.text());

may be this can help you js fiddle link

AJAY SINGH
  • 23
  • 1
  • 6
0

Certain characters have special significance in HTML, and should be represented by HTML entities if they are to preserve their meanings. This function returns a string with some of these conversions made; the translations made are those most useful for everyday web programming. If you require all HTML character entities to be translated, use htmlentities() instead.

htmlspecialchars — Convert special characters to HTML entities

string htmlspecialchars ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = true ]]] )

If the input string passed to this function and the final document share the same character set, this function is sufficient to prepare input for inclusion in most contexts of an HTML document. If, however, the input can represent characters that are not coded in the final document character set and you wish to retain those characters (as numeric or named entities), both this function and htmlentities() (which only encodes substrings that have named entity equivalents) may be insufficient. You may have to use mb_encode_numericentity() instead.

The translations performed are:

'&' (ampersand) becomes '&amp;'
'"' (double quote) becomes '&quot;' when ENT_NOQUOTES is not set.
"'" (single quote) becomes '&#039;' (or &apos;) only when ENT_QUOTES is set.
'<' (less than) becomes '&lt;'
'>' (greater than) becomes '&gt;'
Navnish Bhardwaj
  • 1,687
  • 25
  • 39