-2

Im coding an app for windows phone in c#. the program creates a html file, in the course of the programs running i add a lot of html tags.

now i need to strip those from a string when needed.

now all my searches show me i can take a string turn it into an array then put it back together minus any words i dont want, now this is handy but wont work for my needs. i have no idea where to start or even if it is possible

here is an example of the strings i need to remove

testString = "<a href=\"#AnotherTest\">AnotherTest</a><br>";

so this is a string of the parts i need to remove

List<string> partsToRemove ={"</a>","\">","<br>","<a","href=\"#"};

so how do i take "<a href=\"#AnotherTest\">AnotherTest</a><br>" and remove all the parts included in partsToRemove?

To clarify: i will only be removing html from small strings as needed not from a whole html file

to give a working concept: my program is creating a back ground for a roleplay character, part of that process uses a "gang" generator, the gang generator provides the strings with html tags ready for placement (adding them on the fly is not possible with out radical alteration to my whole program) this is fine for the end result BUT i give users access to the generator itself so if they just want a gang they can use what i have created, this is then diplayed in a textbox (i could easierly change that to another web box) and if enabled the phone reads it out, so here i would take the string created for the gang and feed it through a method that strips the html code and returns a "clean" string

before posting i searched for a solution but all i came across was how to remove words, whole words.

2 Answers2

2

You can try to use regex to do this:

Remove all html tags:

String result = Regex.Replace(htmlDocument, @"<[^>]*>", String.Empty);
Martin Vich
  • 1,062
  • 1
  • 7
  • 22
  • I agree that it's not a good idea to parse whole html page. But from my understanding of the question author wants to process just small html strings like "AnotherTest
    " and get "AnotherTest" from them.
    – Martin Vich Apr 15 '15 at 21:44
  • yeah would not be the whole html page. its tough to explain (which could be why people are clicking the down arrow) ill edit the main post with more details, this solution did not work but im yet to read the regex docs so i may just be misunderstanding it (htmlDocument is the string we wish to convert yes? so http://pastebin.com/LJih1Ms4 would mean i could send that method a string with html markups and get a clean one back? (which it doesnt seem to allow at the moment – David Eastwick Apr 15 '15 at 22:00
  • Yes, htmlDocument is the string we wish to convert. What string didn't work for you? I've tried "AnotherTest
    " and it resulted in "AnotherTest".
    – Martin Vich Apr 15 '15 at 22:12
0

for the case that you've shown, you can use this : /(<a|href=\\"#|">|</a>|<br>|\\)/gm regex But since you might have many different types, the best is to keep a list of patterns, or try to figure out a pattern that matches all the different combinations that you have. It might be more suitable to split the document, and execute a regex multiple times, to keep the regex as simple as possible.

Hope I've answered you're question.

Matju
  • 51
  • 6