0

I have a XML String of 400 lines and it does consists of below tags repeated twice. I want to remove those tags

<Address>
<Location>Beach</Location>
<Dangerous>
    <Flag>N</Flag>
</Dangerous>
</Address>

I am using the below regex pattern but it's not replacing

xmlRequest.replaceAll("<Address>.*?</Address>$","");

I can able to do this in Notepad ++ by selecting [x].matches newline checkbox next to Regular Expression radio button in Find/Replace dialog box

Can anyone suggest what's wrong with my regular expression

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
vkrams
  • 7,267
  • 17
  • 79
  • 129
  • 1
    Once again: do **not** process XML/HTML with regexes. Use XML tools. XML/HTML is a context-free language, a regular expression is not the right tool to process such languages. Only regular languages can be processed with regexes. – Willem Van Onsem Mar 20 '17 at 01:42
  • 1
    Indeed - please read http://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la – James Fry Mar 20 '17 at 01:43
  • Jsoup seems like a good option – bichito Mar 20 '17 at 01:49
  • Could you post the expected output? – bichito Mar 20 '17 at 01:52
  • @efektive, I need to completely remove that block inside the 400 lines of xml string – vkrams Mar 20 '17 at 02:07

3 Answers3

8
xmlRequest.replaceAll("<Address>[\\s\\S]*?</Address>","");

.* don't contains the \n\r , so need use [\s\S] to match all

vkrams
  • 7,267
  • 17
  • 79
  • 129
Kerwin
  • 1,212
  • 1
  • 7
  • 14
  • Works fine Kerwin. Thank you – vkrams Mar 20 '17 at 03:14
  • No, it doesn't work fine. It works on the one test case that you have applied it to. It will fail on other test cases, and whoever has to investigate the bug will curse the person who wrote the code. Do not use regular expressions to process XML, use an XML parser. – Michael Kay Mar 21 '17 at 09:28
  • To expand on this, here are some cases it won't handle correctly: An address element with attributes. An address element with whitespace in the start or end tag. An address element containing a nested Address element. Address tags appearing within comments or CDATA sections. An empty Address element using a self-closing tag. – Michael Kay Mar 21 '17 at 09:36
  • 1
    Hopefully the developer can think for themselves to determine whether this is valid to use or not. How about unit tests? How about me wanting to remove password's from SOAP requests before logging? Not everything is critical. – Matt D. Mar 12 '19 at 13:33
0

As improper as it may be to do what you're suggesting. (See https://stackoverflow.com/a/1732454/6552039 for hilarity and enlightenment.)

You should be able to just ingest your xml with a org.w3c.dom.Document parser, then do a getElementsByTagName("Address"), and have it .remove(Element) the second one. (Assuming a particular interpretation of "below tags repeated twice".

Community
  • 1
  • 1
b4n4n4p4nd4
  • 70
  • 1
  • 10
0

A solution with JSoup

public static void main(String[] args){
    String XmlContent="<Address> <Location>Beach</Location><Dangerous> 
        <Flag>N</Flag> </Dangerous> </Address>";

    String tagToReplace="Address";
    String newValue="";

    Document doc = Jsoup.parse(XmlContent);
    ArrayList<Element> els =doc.getElementsByTag(tagToReplace);
    for(int i=0;i<els.size();i++){
        Element el = els.get(i);
        el.remove();
    }
    XmlContent=doc.body().children().toString();
}
Raju
  • 2,902
  • 8
  • 34
  • 57