-1

I am converting XML children into the element parameters and have a dirty regex script I used in Textmate. I know that dot (.) doesn't search for newlines, so this is how I got it to resolve.

Search

language="(.*)"
(.*)<education>(.*)(\n)?(.*)?(\n)?(.*)?(\n)?(.*)?</education>
(.*)<years>(.*)</years>
(.*)<grade>(.*)</grade>

Replace

grade="$13" language="$1" years="$11">
        <education>$3$4$5$6$7$8$9</education>

I know there's a better way to do this. Please help me build my regex skills further.

rxgx
  • 5,089
  • 2
  • 35
  • 43
  • dup of http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags. – bmargulies Mar 26 '10 at 22:14
  • It would help if you could show a piece of the XML you have, and the XML you want it to be. I find it hard to understand what you are trying to do. – John Mar 26 '10 at 22:17
  • How do I write "(.*)(\n)?(.*)?(\n)?(.*)?(\n)?(.*)?" and "$3$4$5$6$7$8$9" in such a way that I don't have to duplicate myself for every possible newline and returned value? – rxgx Mar 26 '10 at 22:29
  • From a regex-perspective, replace `(.*)(\n)?(.*)?(\n)?(.*)?(\n)?(.*)?` with `[\s\S]*?`. But really, as already mentioned: don't parse xml with regex. – Bart Kiers Mar 26 '10 at 22:33
  • Thanks! I am converting this Word document to XML so I really can't parse it yet with ActionScript. – rxgx Mar 26 '10 at 23:25

2 Answers2

2

Use an xml parser, don't use regex to parse xml.

compie
  • 10,135
  • 15
  • 54
  • 78
0

If there are no other tags inside the <education> element, I would change that part to:

<education>([^<>]*)</education>

If possible, I would use the same technique everywhere else you're using .*. In the case of the language attribute, it would take this form:

language="([^"]*)"
Alan Moore
  • 73,866
  • 12
  • 100
  • 156