0

This question is continuation of: Garbage values coming on pulling data from wordpress

I have dealt with the garbage value by using following piece of code:

 htmlentities($entry->title, ENT_QUOTES | ENT_IGNORE, 'UTF-8')

The problem with above piece of code is that if there is any url in the data then instead of showing that url it breaks the url to something like following:

&#8230; <a href="http://abc.com/blog/">Continue reading <span class="meta-nav">&#8594;</span></a>

Kindly let me know how to ignore if there is url.

Community
  • 1
  • 1
soft genic
  • 2,016
  • 3
  • 27
  • 44
  • html tags in URL? are you sanitizing inputs? – Mr. Alien Feb 14 '13 at 18:11
  • @Mr.Alien i am sanitizing the data i am pulling from wordpress. So that i can ignore garbage values like `  , –` . The code i mentioned above gets rid of garbage values but messed up the url as mentioned above – soft genic Feb 14 '13 at 18:12
  • Hi, you might want to take a look at my answer here: http://stackoverflow.com/a/14785592/382564 – Angad Feb 14 '13 at 18:47
  • @Angad thanks but issue is still there i used your answered function but it didnot change anything but the code i mentioned above did do the trick but with another issue mentioned above. – soft genic Feb 14 '13 at 19:07
  • @softgenic Just posted an answer, have attempted to answer your use-case. It should work, let me know how it goes :) – Angad Feb 15 '13 at 08:25

2 Answers2

1

This is a hacky solution but gathering how you are approaching this without worrying about character encoding, you probably just want the damn thing to work.

First, we convert hyperlinks into hacky BBCode. Then, we run htmlentities() on it, lastly we replace the hacky A BBCode with good old HTML. Have a look at this:

$foo = 'Opening quietly in Chicagos West Loop, the Inspire Business Center is looking to take a more active role in Chicagos startup scene &#8230; Continue reading <span class="meta-nav">&#8594;</span>';
echo smartencode($foo);

function smartencode($str) {
     $tags = 'a|span';
     // Convert Anchor Tags to hacky-BBCode
     $ret = preg_replace('/\<(\/?('.$tags.').*)\>/U', '[$1]', $str);

     // Remove so-called Garbage
     $ret = preg_replace('/[^(\x20-\x7F)]*/','', $ret);
     // $ret = htmlentities($ret, ENT_QUOTES | ENT_IGNORE, 'UTF-8');

     // Reinstate Anchor tags in HTML
     $ret = preg_replace('/\[(\/?('.$tags.').*)\]/U', '<$1>', $ret);
     return $ret;
}

Again, it's not elegant. In fact if you look closely you could find some pitfalls for it - but I think it could just work for your use-case.

Tested on http://writecodeonline.com/php/ and worked as expected.

Angad
  • 2,803
  • 3
  • 32
  • 45
  • Thanks, but it didn't work either kindly test it for following text and you will notice no effect on garbage text like `’`: `Opening quietly in Chicago’s West Loop, the Inspire Business Center is looking to take a more active role in Chicago’s startup scene. “As Chicago’s` , - – soft genic Feb 15 '13 at 17:09
  • @softgenic I have modified the code, please test now. It is dropping the 'garbage' characters. – Angad Feb 15 '13 at 17:56
  • +1 for your effort, Thanks again , your updated answer got rid of garbage values but still the issue of link is still there kindly take a look at following converted text from your function: `Opening quietly in Chicagos West Loop, the Inspire Business Center is looking to take a more active role in Chicagos startup scene … Continue reading ` – soft genic Feb 15 '13 at 18:00
  • Instead of `… Continue reading ` it should look like `… Continue reading →` – soft genic Feb 15 '13 at 18:01
  • @softgenic Modified again, do check – Angad Feb 15 '13 at 18:12
  • Thanks alot, I wish i could give you one more vote but Alas! only allowed for one. Can you kindly let me know what else you change in the answered code? – soft genic Feb 15 '13 at 18:22
  • kindly post your answer for garbage values to the following question, so i can accept appropriate answer over there too: http://stackoverflow.com/questions/14880551/garbage-values-coming-on-pulling-data-from-wordpress – soft genic Feb 15 '13 at 18:27
  • Adam's answer to your question is working - which answer of mine would you like me to post? The one to drop the garbage characters? The question currently says that you would like to display them and not drop them – Angad Feb 15 '13 at 19:21
  • well adam's answer not working in my case i tested at my end it doing nothing, it didnot do anything to following: `Center is looking to take a more active role in Chicago’s startup scene. “As Chicago’s` – soft genic Feb 16 '13 at 19:25
0

Url issue is being fixed by using htmlspecialchars_decode() function that fixes special character data.

Also following line of code fixed the garbage values issue too along with URL:

$ret = $feed;     
echo htmlspecialchars_decode(htmlentities($ret, ENT_QUOTES | ENT_IGNORE, 'UTF-8')); 
soft genic
  • 2,016
  • 3
  • 27
  • 44