How do I strip all attributes from HTML tags in a string, except "alt" and "src" using Java?
And further.. how do I get the content from all "src" attributes in the string?
:)
How do I strip all attributes from HTML tags in a string, except "alt" and "src" using Java?
And further.. how do I get the content from all "src" attributes in the string?
:)
You can:
Whatever you do, don't try and do it with regular expressions.
OK, solved this somehow.
Used the HTMLCleaner library to parse the input data to a valid format.
Then I use a DOM parser to iterate over everything, and strip all disallowed tags and attributes.
(and some minor ugly hacks;) )
This was kind of a lot of work.