-1

I am trying to use regular expressions in sublime 3, to remove all the content between two strings, an XML file.

Suppose this is my content:

        <Body name="ground">
            <mass>0</mass>
            <mass_center> 0 0 0</mass_center>
            <inertia_xx>0</inertia_xx>
            <inertia_yy>0</inertia_yy>
            <inertia_zz>0</inertia_zz>
            <inertia_xy>0</inertia_xy>
            <inertia_xz>0</inertia_xz>
            <inertia_yz>0</inertia_yz>
            <!--Joint that connects this body with the parent body.-->
            <Joint />
            <VisibleObject>
                <!--Set of geometry files and associated attributes, allow .vtp, .stl, .obj-->
                <GeometrySet>
                    <objects />
                    <groups />
                </GeometrySet>
                <!--Three scale factors for display purposes: scaleX scaleY scaleZ-->
                <scale_factors> 1 1 1</scale_factors>
                <!--transform relative to owner specified as 3 rotations (rad) followed by 3 translations rX rY rZ tx ty tz-->
                <transform> -0 0 -0 0 0 0</transform>
                <!--Whether to show a coordinate frame-->
                <show_axes>false</show_axes>
                <!--Display Pref. 0:Hide 1:Wire 3:Flat 4:Shaded Can be overriden for individual geometries-->
                <display_preference>4</display_preference>
            </VisibleObject>
            <WrapObjectSet>
                <objects />
                <groups />
            </WrapObjectSet>
        </Body>

Now suppose I want to remove all the content between <VisibleObject> and </VisibleObject> to leave only:

        <Body name="ground">
            <mass>0</mass>
            <mass_center> 0 0 0</mass_center>
            <inertia_xx>0</inertia_xx>
            <inertia_yy>0</inertia_yy>
            <inertia_zz>0</inertia_zz>
            <inertia_xy>0</inertia_xy>
            <inertia_xz>0</inertia_xz>
            <inertia_yz>0</inertia_yz>
            <!--Joint that connects this body with the parent body.-->
            <Joint />
            <VisibleObject>
            </VisibleObject>
            <WrapObjectSet>
                <objects />
                <groups />
            </WrapObjectSet>
        </Body>

There are a few similar threads and problems, to the above but none of them seem to work particularly well (or at all) for this problem.

Any help would be most appreciated.

Astrid
  • 1,846
  • 4
  • 26
  • 48
  • How about Find `()[\S\s]*?()`, Replace `$1$2` ? –  Aug 15 '16 at 18:20
  • With Notepad++, you can use XSLT transformations, and easily modify any XML. Not sure if Sublime Text supports modifying XML files with XSLT. – Wiktor Stribiżew Aug 15 '16 at 19:22
  • Indeed, as @WiktorStribiżew mentions, this can easily be handled with XSLT running an identity transform and then re-writing the `` element. Also, it is highly advised not to use regex on [X/HTML documents](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – Parfait Aug 15 '16 at 21:14

2 Answers2

3

An image with the sublime window:

Sublime Regex

You can find it via Find, then Replace and make sure you tick the most outer left options.

Jan
  • 42,290
  • 8
  • 54
  • 79
1

Sublime appears to use PCRE, according to this page.

That means that you should be able to use the cool tricks PCRE offers (mostly negative look-ahead). This can speed up performance considerably.

The regex I recommend is:

<VisibleObject>(?:[^<]*(?!</VisibleObject).)+</VisibleObject>

Essentially, the negative look-ahead ensures that whenever a < is present (namely at the start of a tag), it's not the closing </VisibleObject>.

The . is needed so that the engine can backtrack one character when the negative look-ahead sees the closing tag.

You will need to use the replacement <VisibleObject></VisibleObject>.

Laurel
  • 5,965
  • 14
  • 31
  • 57