0

The goal is to change position of specific xml tag (and it's content) within the given string - ideally using regex.
The string (representing my xml data) has a structure that <MoveMe> elements appear before <Target> elements.

How to move all <MoveMe>.*</MoveMe> and <xsi:MoveMe>.*</xsi:MoveMe> occurences after after equivalent </Target>or </xsi:Target>??

input:

<?xml version="1.0"?>
<stylesheet version="1.0" xmlns:xsi="http://some.namespace.org">
    <template>
        <root>
            <body>
                <h2>sample</h2>
                <table>
                    <tr>
                        <th>Title</th>
                        <th>Artist</th>
                    </tr>
                    <MoveMe>Hans Müller fist
                        content 1 </MoveMe>
                    <Target>
                        <td>a1</td>
                        <td>b1</td>
                    </Target>
                </table>
                <table>
                    <tr><th>Title</th></tr>
                    <xsi:MoveMe>again</xsi:MoveMe>
                    <xsi:Target>
                        <td>x2</td>
                    </xsi:Target>
                </table>
            </body>
        </root>
    </template>
</stylesheet>

output:

<?xml version="1.0"?>
<stylesheet version="1.0" xmlns:xsi="http://some.namespace.org">
    <template>
        <root>
            <body>
                <h2>sample</h2>
                <table>
                    <tr>
                        <th>Title</th>
                        <th>Artist</th>
                    </tr>
                    <Target>
                        <td>a1</td>
                        <td>b1</td>
                    </Target>
                    <MoveMe>Hans Müller fist
                        content 1 </MoveMe>
                </table>
                <table>
                    <tr><th>Title</th></tr>
                    <xsi:Target>
                        <td>x2</td>
                    </xsi:Target>
                    <xsi:MoveMe>again</xsi:MoveMe>
                </table>
            </body>
        </root>
    </template>
</stylesheet>

So far i managed to capture all grupus of MoveMe nodes using this pattern:
s_pat = "(<(xsi:)?MoveMe>(.*?)<\/(xsi:)?MoveMe>)"

Note that <table> element can occur multiple times, but both MoveMe and Target elements are single.

modzello86
  • 433
  • 7
  • 16
  • 1
    parsing XML with regex is similar to parsing HTML with regex. i will refer you to this other SO question for answer to that: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – grepe Mar 09 '17 at 16:11
  • well, this is quite custome case we are dealing with here so we canconsider this just as a string i guess... - other topic would be to parse this to ElementTree object and try to manipulate with child order inside, but this seems to be another story. – modzello86 Mar 09 '17 at 16:26
  • Are _moveme_ and _target_ adjacent? –  Mar 09 '17 at 16:39

1 Answers1

0

If it's this simple, something like this

Find
(?s)([^\S\r\n]*<MoveMe>.*?</MoveMe>[^\S\r\n]*(?:\r?\n)?)(.*?<Target>.*?</Target>[^\S\r\n]*(?:\r?\n)?)
Replace
$2$1

Generally though, the regex to parse TAGS only is below, which might be a bit much for you.

This just parses an atomic tag. To actually handle nesting and closure
is another issue entirely.

<(?:(?:(?:(script|style|object|embed|applet|noframes|noscript|noembed)(?:\s+(?>"[\S\s]*?"|'[\S\s]*?'|(?:(?!/>)[^>])?)+)?\s*>)[\S\s]*?</\1\s*(?=>))|(?:/?[\w:]+\s*/?)|(?:[\w:]+\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]?)+\s*/?)|\?[\S\s]*?\?|(?:!(?:(?:DOCTYPE[\S\s]*?)|(?:\[CDATA\[[\S\s]*?\]\])|(?:--[\S\s]*?--)|(?:ATTLIST[\S\s]*?)|(?:ENTITY[\S\s]*?)|(?:ELEMENT[\S\s]*?))))>