Remove XML Tag and Content in XML String using Java Regex

Question

I have a XML String of 400 lines and it does consists of below tags repeated twice. I want to remove those tags

<Address>
<Location>Beach</Location>
<Dangerous>
    <Flag>N</Flag>
</Dangerous>
</Address>

I am using the below regex pattern but it's not replacing

xmlRequest.replaceAll("<Address>.*?</Address>$","");

I can able to do this in Notepad ++ by selecting [x].matches newline checkbox next to Regular Expression radio button in Find/Replace dialog box

Can anyone suggest what's wrong with my regular expression

Once again: do **not** process XML/HTML with regexes. Use XML tools. XML/HTML is a context-free language, a regular expression is not the right tool to process such languages. Only regular languages can be processed with regexes. — Willem Van Onsem, Mar 20 '17 at 01:42
Indeed - please read http://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la — James Fry, Mar 20 '17 at 01:43
@efektive, I need to completely remove that block inside the 400 lines of xml string — vkrams, Mar 20 '17 at 02:07

score 8 · Accepted Answer · edited Mar 20 '17 at 03:13

8

xmlRequest.replaceAll("<Address>[\\s\\S]*?</Address>","");

.* don't contains the \n\r , so need use [\s\S] to match all

edited Mar 20 '17 at 03:13

vkrams

7,267
17
79
129

answered Mar 20 '17 at 02:27

Kerwin

1,212
1
7
14

Works fine Kerwin. Thank you – vkrams Mar 20 '17 at 03:14
No, it doesn't work fine. It works on the one test case that you have applied it to. It will fail on other test cases, and whoever has to investigate the bug will curse the person who wrote the code. Do not use regular expressions to process XML, use an XML parser. – Michael Kay Mar 21 '17 at 09:28
To expand on this, here are some cases it won't handle correctly: An address element with attributes. An address element with whitespace in the start or end tag. An address element containing a nested Address element. Address tags appearing within comments or CDATA sections. An empty Address element using a self-closing tag. – Michael Kay Mar 21 '17 at 09:36
1

Hopefully the developer can think for themselves to determine whether this is valid to use or not. How about unit tests? How about me wanting to remove password's from SOAP requests before logging? Not everything is critical. – Matt D. Mar 12 '19 at 13:33

score 0 · Answer 2 · edited May 23 '17 at 12:25

0

As improper as it may be to do what you're suggesting. (See https://stackoverflow.com/a/1732454/6552039 for hilarity and enlightenment.)

You should be able to just ingest your xml with a org.w3c.dom.Document parser, then do a getElementsByTagName("Address"), and have it .remove(Element) the second one. (Assuming a particular interpretation of "below tags repeated twice".

edited May 23 '17 at 12:25

Community

1
1

answered Mar 20 '17 at 02:07

b4n4n4p4nd4

70
1
10

score 0 · Answer 3 · answered Jan 21 '18 at 12:05

A solution with JSoup

public static void main(String[] args){
    String XmlContent="<Address> <Location>Beach</Location><Dangerous> 
        <Flag>N</Flag> </Dangerous> </Address>";

    String tagToReplace="Address";
    String newValue="";

    Document doc = Jsoup.parse(XmlContent);
    ArrayList<Element> els =doc.getElementsByTag(tagToReplace);
    for(int i=0;i<els.size();i++){
        Element el = els.get(i);
        el.remove();
    }
    XmlContent=doc.body().children().toString();
}

Remove XML Tag and Content in XML String using Java Regex

3 Answers3

Linked