-1

How can i replace a "<" and a ">" (in the content of xml file) with a matching "&lt;" and "&gt;" (with a pre known set of tags) using a regex?

example: <abc>fd<jkh</abc><def>e>e</def> should result with: <abc>fd&lt;jkh</abc><def>e&lt;e</def>

it must be done with a regex! (no xml load and such...)

David Basarab
  • 72,212
  • 42
  • 129
  • 156
Jack
  • 667
  • 3
  • 9
  • 13
  • 1
    What do you want to do with ` – SLaks Jan 25 '10 at 18:58
  • thanks for replying! i have no attributes in this file. – Jack Jan 25 '10 at 19:00
  • 1
    @Jack: But are there any CDATA section or comments? – kennytm Jan 25 '10 at 19:16
  • no. there is no CDATA or comments – Jack Jan 25 '10 at 19:31
  • this is the example which will solve all the cases. ">sdfsdf<>asdasd<>asdasdasds<" only the and the matching are predefined. – Jack Jan 25 '10 at 19:32
  • -1 The failure to specify that you wanted to replace `<` and `>` on enclosed tags like `` that don't have a matching closing tag caused us to waste time on this question, which is clearly out of bounds for a reasonable regex solution. – Jay Jan 25 '10 at 19:37
  • Don't do it with a regex. Regardless of your situation, you're going to run into trouble: http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege – aronchick Jan 25 '10 at 20:09

2 Answers2

2

I think the pattern

<([^>]*<)

will match a < that encounters another < before > (therefore not part of a tag)

...and the pattern

(>[^<]*)>

will match a > that follows another >

var first = Regex.Replace(@"<abc>fd<jkh</abc><def>e>e</def>",@"<([^>]*?<)",@"&lt;$1");
var final = Regex.Replace(first,@"(>[^<]*?)>",@"$1&gt;");

EDIT:

This does work, but you have to pass over it multiple times. I'm sure there's a purer method, but this does work.

class Program
{
    static void Main(string[] args)
    {
        var next = @"<abc>dffs<<df</abc>";
        string current;
        do
        {
            current = next;
            next = Regex.Replace(current, @"<([^>]*?<)", @"&lt;$1");
            next = Regex.Replace(next, @"(>[^<]*?)>", @"$1&gt;");
        } while(next != current);
        Console.WriteLine(current);
        Console.ReadKey();
    }
}
Jay
  • 56,361
  • 10
  • 99
  • 123
  • 2
    @Jack you didn't give dffs< as an example. I think you should learn some RegEx so that you can take Jay's example and expand it. Remember we are supposed to point you down a path, not walk it for you. – David Basarab Jan 25 '10 at 19:17
  • this is my actual example, which will solve all the cases. ">sdfsdf<>asdasd<>asdasdasds<" only the and the matching are predefined. – Jack Jan 25 '10 at 19:29
  • this one is very good! but still didn't work with replacing the and <> ! – Jack Jan 25 '10 at 19:36
0
s/<(?=[^<>]*<)/&lt;/g
s/>(?<=\>[^<>]*)/&gt;/g

In C#,

new Regex("<(?=[^<>]*<)").Replace(your_xml_string, "&lt;");
new Regex(">(?<=\>[^<>]*)").Replace(your_xml_string, "&gt;");

Not tested. I don't have C# on my hand.

kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • very good. but it didnt replace something like >dfddf and also didn't replace the not predifined tags like sdfsdfdasf there has to be some predefine tags... otherwise we don't get the replacement of the undefined tags... – Jack Jan 25 '10 at 19:23
  • @Jack: Why can't you write down all arguments at once in the question? And you can't expect to use 1 or 2 simple regex if you need to detect ``. – kennytm Jan 25 '10 at 20:24