The primary XSS vulnerability in (PHP) markdown after disallowing HTML tags seems to be that it allows links like this:
[foo](javascript:alert('xss'))
which will turn into
<a href="javascript:alert('xss')">foo</a>
and the same applies for <img src="">
.
I'm currently developing a very basic Q&A section on a site, and I use markdown for the questions and the answers. I can quite confidently say that the only legitimate use of links on this site will be http://
or https://
links.
If I modified the regex markdown uses to process links and allowed only urls beginning with the characters http
, would that prevent XSS attacks?
P.S. This isn't part of my current question, but I would be much obliged if some kind soul showed me how to modify the frustratingly complex regex in question.
EDIT: I have already read PHP Markdown XSS Sanitizer and the only reason I'm asking this question is because I'm considering an alternate approach. My question is not 'How to sanitize markdown output to prevent XSS' but rather, 'Will this approach prevent XSS attacks'? As such, it is not a duplicate, it is an alternative. Also, doesn't the fact that this question received upvotes show that there are at least some people who are wondering the same thing I am even though the earlier question exists?