Well - the title basically says it.
I want to look at the URLs query, and look for specific values (singular character, or a small string).
I can do this successfully - so long as I'm only looking for "normal" characters (those that are often termed as "safe" [a-zA-Z0-9-_.~] ).
As soon as I start looking for 'unsafe' or 'foreign' characters - it gets ugly.
I've spent the entire day (and part of yesterday too) attempting to figure this out.
I've read tons ... RFCs, php.net for encode stuff, detect encode etc. to.
I've even attempted to set the encode/charset at the top of the script etc. too.
I've gone through various encode options, setting dynamicsally, manually etc.
Nothing has worked.
Try the little script below.
slap it into a file and access it - and append the query path below;
?q=a1-.<^舆
See what resutls you get.
function curPageURL() {
$pageURL = 'http';
if ($_SERVER["HTTPS"] == "on") {$pageURL .= "s";}
$pageURL .= "://";
if ($_SERVER["SERVER_PORT"] != "80") {
$pageURL .= $_SERVER["SERVER_NAME"].":".$_SERVER["SERVER_PORT"].$_SERVER["REQUEST_URI"];
} else {
$pageURL .= $_SERVER["SERVER_NAME"].$_SERVER["REQUEST_URI"];
}
return $pageURL;
}
$needles = array(
needle1 => 'a',
needle2 => '1',
needle3 => '-',
needle4 => '.',
needle5 => '<',
needle6 => '^',
needle7 => 'Ë',
needle8 => 'à',
needle9 => 'Ü'
);
$haystack = parse_url(curPageURL(), PHP_URL_QUERY);
if (!empty($haystack)) {
$needlelist = implode(' | ',$needles);
echo "We are looking for some needles ( ".$needlelist." ) in a haystack (".$haystack.")<br/>";
foreach ($needles as $key=>$needle) {
echo "We are looking for ".$key."<br/>";
$check = strpos($haystack,$needle);
if ($check !== false) {
echo " - Yes : we found a needle (".$needle.") in the haystack";
} else {
echo " - No : we failed to find the needle (".$needle.") in the haystack";
}
echo "<br/>";
}
echo "--------------<br/>now lets try it with a little basing?<br/>";
foreach ($needles as $key=>$needle) {
echo "We are looking for ".$key."<br/>";
// Basing - encode the searched for value, and replace any double-encoded % chars
$needle = str_replace('%25','%',rawurlencode($needle));
$check = strpos($haystack,$needle);
if ($check !== false) {
echo " - Yes : we found a needle (".$needle.") in the haystack";
} else {
echo " - No : we failed to find the needle (".$needle.") in the haystack";
}
echo "<br/>";
}
}
I don't know about you, but instead of the strange characters, or their correct hex codes (as per the various lists/tables for urlencoded chars), I get the following ([Searched for] (1st results) (2nd results));
/a a a
/1 1 1
/- - -
/. . .
/< < %3C
/^ ^ %5E
/Ë Ã‹ %C3%8B
/à Ã %C3%A0
/Ü Ãœ %C3%9C
(/ added to prevent line insertion + the encoding here makes this Very difficult to post!)
the problem is - for example, the last one ... Ü should become %DC (as far as I can tell) - so why the paired hex?
I've tried reading up on multibyte stuff ... but I fail to see how the Browsers are encoding the chars in the URL, but the script won't.
So - anyone see what I'm doing wrong, or not doing, or figured this out already?
.
For the sake of Clarity...
... I am NOT asking how to replace the characters (I do Not want to turn Ü into U).
Simply take a given string and see if it is in the URL (straight, or encoded for the URL).
Thanks, and I hope someone can help.