0

I have a string: `<p onclick="alert('abc')" style="color: black">text</p>`

I want to remove all Javascript like onclick, onchange, ... leaving only HTML and CSS. is there any way to do this in C#? the only way I can think of is to remove each javascript tag from the string.

Input: <p onclick="alert('abc')" style="color: black">text</p>

Output: <p style="color: black ">text</p>

nsnd64
  • 89
  • 8
  • 1. Parse the HTML using a library like HtmlAgilityPack. 2. Loop through all the elements checking their attributes for inline js. 3. Remove said attributes. 4. Write new HTML to file. – ProgrammingLlama Apr 07 '22 at 04:36
  • https://stackoverflow.com/a/65947149/3181933 - this pretty much does what you want except for the names of the attributes being different. – ProgrammingLlama Apr 07 '22 at 04:37

2 Answers2

4

You can use HtmlSanitizer to remove the inline java script for provided HTML fragment.

For ex - the following code

var sanitizer = new HtmlSanitizer();
var html = @"<script>alert('xss')</script><div onload=""alert('xss')"""
    + @"style=""background-color: test"">Test<img src=""test.gif"""
    + @"style=""background-image: url(javascript:alert('xss')); margin: 10px""><p onclick =""alert('abc')"" style =""color: black"">text</p></div>";
var sanitized = sanitizer.Sanitize(html);

returns the output as

<div>Test<img src="test.gif" style="margin: 10px"><p style="color: rgba(0, 0, 0, 1)">text</p></div>

You can check this fiddle for more details.

user1672994
  • 10,509
  • 1
  • 19
  • 32
1

The best way is to use Html Agility Pack. I have linked tha page you need in its documentations.

Use it like this:

var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var pNode = htmlDoc.DocumentNode.SelectSingleNode("//p");   
pNode.Attributes.Remove("onclick");

Here is the fiddle.

Hamid Reza
  • 2,913
  • 9
  • 49
  • 76