1

I have to remove the span tags inside a string such:

<span>Operation Gambling:</span><span>la mano della crimitalità </span><span>sull'azzardo</span>

To do this, I use the following regexp:

Regex.Replace(inHTML, "<span[^>]*?>", string.Empty).Replace("</span>", "&nbsp;</span>");

the result sometimes is correct but in this case is:

Operazione Gambling: la mano della crimitalità sull&nbsp;azzardo

As you can see the single quote has been remove, how can I keep it by modifying the pattern?

Ras
  • 628
  • 1
  • 11
  • 29
  • 1
    Please show **both** the input and the output of the case that fails. We cannot see what quote has been removed. More generally, regular expressions are the wrong way to manipulate HTML, use a proper HTML parser. See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – AdrianHHH Jul 27 '15 at 08:38
  • What you wish to do can't be done with one regex I think. It is a 2 to 3 step process: 1. remove span tags at the beginning and end and consume the white-spaces around them. 2. Replace any amount of span tags separated by white-spaces and the spaces around them with one single space. – maraca Jul 27 '15 at 08:49
  • ... ok I see you accepted, but this will give you `...Gambling:la mano...`, no space after the colon – maraca Jul 27 '15 at 08:59
  • Hi maraca, yes in this case there's no blank space but this is just an example. I'm sure that there are blank spaces among words. – Ras Jul 27 '15 at 09:03

1 Answers1

5

You can use this code for removing HTML tag inside your string:

var str = "<span>Operation Gambling:</span><span>la mano della crimitalità </span><span>sull'azzardo</span>";
String result = Regex.Replace(str, @"<[^>]*>", String.Empty);
System.Console.WriteLine(result);

Or this regex for removing just span tags:

Regex.Replace(str, @"</?span( [^>]*|/)?>", String.Empty);
Sirwan Afifi
  • 10,654
  • 14
  • 63
  • 110