0

I have a PHP script which processes user input. I need to escape all special characters, but also make links clickable (turn them into <a> elements). What I need is:

function specialCharsAndLinks($text) {
    // magic goes here
}
$inp = "http://web.page/index.php?a1=hi&a2=hello\n<script src=\"http://bad-website.com/exploit.js\"></script>";
$out = specialCharsAndLinks($inp);
echo $out;

The output should be (in HTML):

<a href="http://web.page/index.php?a1=hi&a2=hello">http://web.page/index.php?a1=hi&amp;a2=hello</a>
&lt;script src="http://bad-website.com/exploit.js"&gt;&lt;/script&gt;

Note that the amperstand in the link stays in the href attribute, but is converted to &amp; in the actual content of the link.

When viewed in a browser:

http://web.page/index.php?a1=hi&a2=hello <script src="http://bad-website.com/exploit.js"></script>

randomdude999
  • 701
  • 7
  • 20

2 Answers2

2

I eventually solved it with:

function process_text($text) {
    $text = htmlspecialchars($text);
    $url_regex = "/(?:http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+(?:\/\S*)?/";
    $text = preg_replace_callback($url_regex, function($matches){
        return '<a href="'.htmlspecialchars_decode($matches[0]).'" rel="nofollow">'.$matches[0]."</a>";
    }, $text);
    return $text;
}

The first line html-encodes the input.
The second line defines the URL regex. Could be improved, but working for now.
The 3rd line uses preg_replace_callback, a function which is like preg_replace, but instead of supplying it with a replacement string, you supply a replacement function that returns the replacement string.
The 4th line is the actual function. It's quite self-documenting. htmlspecialchars_decode undoes the actions of htmlspecialchars (therefore making the link valid if it contained an amperstand).

randomdude999
  • 701
  • 7
  • 20
0

Try this:

$urlEscaped = htmlspecialchars("http://web.page/index.php?a1=hi&a2=hello");
$aTag = '<a href="$urlEscaped">Hello</a>';
echo $aTag;

Your example doesn't work because if escaping whole html tag, a tag will never get processed by the browser, instead it will just display as plain text.

As you can see, stackoverflow escapes our whole input (questions/answers ...), so we can actually see the code, and not letting browser to process it.

Boy
  • 1,182
  • 2
  • 11
  • 28
  • My example works perfectly. The link in the rendered output is an `` tag. – randomdude999 Apr 15 '16 at 11:37
  • You wanna say that you see 'Hello' only? (procesed tag) – Boy Apr 15 '16 at 11:41
  • Ok, so why not just decode the content? content htmlspecialchars_decode() – Boy Apr 15 '16 at 12:38
  • I can html-decode the URL (since that needs to be decoded, possibly url-encoded), but keep everything else the same way. Also, I have long text, where everything might not be the url, like in the new example. But now, I can't use a regex since preg_replace can't call functions. – randomdude999 Apr 15 '16 at 12:54
  • I'm still confused with what you really want... On the end of your question, by saying "When viewed in a browser:" did you mean that this is what you want or what you dont want? What kind of xss are you trying? – Boy Apr 15 '16 at 13:55
  • I mean, do you want a script to execute or not? – Boy Apr 15 '16 at 13:59
  • the 'viewed in a browser' is just the previous code without 4 spaces in the beginning (therefore stackoverflow renders it as html). – randomdude999 Apr 15 '16 at 14:03