0

I'm wondering how you're supposed to handle invalid encoded URL's in PHP. PHP's urldecode decodes invalid encoded querystrings. See example below.

// % is invalid here, it should have been encoded but was not.
$url = 'http://www.test.com/?invalid_parameter=t%C3%A9kst%ads';
$parsedUrl = parse_url($url);
// parse_str automatically urldecode's it.
parse_str($parsedUrl['query'], $output);
// Outputs: tékst�s
var_dump($output['invalid_parameter']);

How can I either detect or remove the invalid placed %, which is resulting in a UTF-8 replacement character?

mtricht
  • 427
  • 4
  • 16
  • Where are these invalidly encoded querystrings coming from? Best thing is don't invalidly encode them in the first place. – developerwjk Dec 10 '15 at 17:53
  • @developerwjk This is what I'm getting from an API, I have no control over this. If I had control I would obvisouly change the URL but I'm trying to catch their error here. – mtricht Dec 10 '15 at 17:54
  • What's the desired output here? – Kevin Lee Dec 10 '15 at 17:55
  • 1
    Maybe reality is different than you expect: maybe this is actually a url parameter that has not been encoded at all. So meant literally as string containing percent chars. Sure, invalid, but happens when newbees implement a API. And it would explain the sequence. – arkascha Dec 10 '15 at 17:56
  • @arkascha is completely right. You cannot use `parse_str` which implements `urldecode()` on a URL string that was incorrectly encoded in the first place. – Ohgodwhy Dec 10 '15 at 18:07
  • Kevin Lee: i would like to be able to detect that there's a UTF-8 replacement character in the string so I'm able to handle this 'error'. @arkascha you'd be surprised who this 'newbee' is. – mtricht Dec 10 '15 at 18:08
  • @Ohgodwhy I completly agree, but how can I prevent this if there's no way of knowing the URL is wrongly encoded? – mtricht Dec 10 '15 at 18:09
  • I doubt I'd be surprised. Been in the business long enough and played with the big ones. Just humans :-) Except if that is a google API. Then I'd say: was to be expected. – arkascha Dec 10 '15 at 18:09
  • found this two post on SO, hope if it helps, http://stackoverflow.com/questions/910793/detect-encoding-and-make-everything-utf-8, http://stackoverflow.com/questions/20025030/convert-all-types-of-smart-quotes-with-php/21491305#21491305 . I personally know nothing about encoding except it should functioned it return a proper desired output... – Andrew Dec 10 '15 at 18:17

0 Answers0