0

I have the following XML :

<customer>
   <name>Customer name</name>
   <address>
      <postalcode>94510</postalcode>
      <town>Green Bay</town>
   </address>
   <phone>0645878787</phone>
</customer>

I would like using only REGEX, replace the whole <address>..</address> tag with an empty string if the postal code is 94510

I have

String s = "<the xml above here/>"
s = s.replace(source, target);

I only have control over "source" and "target". Is there a regular expression that may solve this problem ?

Thank you

Simo L.
  • 321
  • 1
  • 3
  • 20
  • 1
    Regexes are not the right tool for this because XML is not a regular language. Java has XML-processing facilities; why don't you want to use those? – Wintermute Apr 15 '15 at 08:54
  • 1
    You can find lots of options here: http://www.tutorialspoint.com/java_xml/java_xml_parsers.htm. Are you totally sure you cannot use an XML parser, and only need a regex? The regex is `(?s)
    .*?94510.*?
    \\s*`, and replacement string is ''. However, if your XML is malformed, you might have unexpected results.
    – Wiktor Stribiżew Apr 15 '15 at 08:55
  • Note that `replace` accepts a regular String, `replaceAll` accepts a regex. – Maroun Apr 15 '15 at 08:55

2 Answers2

0

As it has been stated, please do not use regular to process XML. Below is the approach you should take (code adapted from here and here).:

String str = "<customer>\n" +
                        "   <name>Customer name</name>\n" +
                        "   <address>\n" +
                        "      <postalcode>94510</postalcode>\n" +
                        "      <town>Green Bay</town>\n" +
                        "   </address>\n" +
                        "   <phone>0645878787</phone>\n" +
                        "</customer>";
ByteArrayInputStream bais = new ByteArrayInputStream(str.getBytes());
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(bais);

//optional, but recommended
//read this - http://stackoverflow.com/questions/13786607/normalization-in-dom-parsing-with-java-how-does-it-work
doc.getDocumentElement().normalize();

System.out.println("Root element :" + doc.getDocumentElement().getNodeName());

NodeList nList = doc.getElementsByTagName("address");
    for(int i = 0; i < nList.getLength(); i++)
    {         
        NodeList children = nList.item(i).getChildNodes();
        for(int j = 0; j < children.getLength(); j++)
        {
            Node current = children.item(j);
            if((current.getNodeName().equals("postalcode")) && (current.getTextContent().equals("94510")))
            {
                current.getParentNode().getParentNode().removeChild(nList.item(i));                    
            }
        }            
    }

    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    StreamResult result = new StreamResult(new StringWriter());
    DOMSource source = new DOMSource(doc);
    transformer.transform(source, result);

    String xmlString = result.getWriter().toString();
    System.out.println(xmlString);

Which yields:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<customer>
   <name>Customer name</name>

   <phone>0645878787</phone>
</customer>

If you really, really must use regular expressions, take a look at the below:

String str = "<customer>\n" +
                        "   <name>Customer name</name>\n" +
                        "   <address>\n" +
                        "      <postalcode>94510</postalcode>\n" +
                        "      <town>Green Bay</town>\n" +
                        "   </address>\n" +
                        "   <phone>0645878787</phone>\n" +
                        "</customer>";

    System.out.println(str.replaceAll("(?s)<address>.+?<postalcode>94510</postalcode>.+?</address>.+?<phone>", "<phone>"));

Yields:

<customer>
   <name>Customer name</name>
   <phone>0645878787</phone>
</customer>
npinti
  • 51,780
  • 5
  • 72
  • 96
0

The most straightforward way I can see to do this without external libraries is to use an XPath expression to select the nodes that should be deleted, and to then delete them. This is fairly verbose in Java but not terribly complicated:

import java.io.*;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.dom.*;
import org.w3c.dom.*;

public class Foo {
  // Error handling should be done, but I can't know what you want to happen
  // in case of broken XML.
  public static void main(String[] args) throws Exception {
    String xml =
        "<customer>\n"
      + "   <name>Customer name</name>\n"
      + "   <address>\n"
      + "      <postalcode>94510</postalcode>\n"
      + "      <town>Green Bay</town>\n"
      + "   </address>\n"
      + "   <phone>0645878787</phone>\n"
      + "</customer>";

    // XPath expression: It selects all address nodes under /customer
    // that have a postalcode child whose text is 94510
    String selection = "/customer/address[postalcode=94510]";

    // Lots of fluff -- the XML API is full of factories; don't mind them.
    // What all this does is to parse the document from the string.
    InputStream     source   = new ByteArrayInputStream(xml.getBytes());
    Document        document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(source);

    // Create a list of nodes that match our XPath expression
    XPathExpression xpath    = XPathFactory.newInstance().newXPath().compile(selection);
    NodeList        nodes    = (NodeList) xpath.evaluate(document, XPathConstants.NODESET);

    // Remove all those nodes from the document
    for(int i = 0; i < nodes.getLength(); ++i) {
      Node n = nodes.item(i);
      n.getParentNode().removeChild(n);
    }

    // And finally print the document back into a string.
    StringWriter writer = new StringWriter();
    Transformer  tform  = TransformerFactory.newInstance().newTransformer();

    tform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    tform.transform(new DOMSource(document), new StreamResult(writer));

    // This is our result.
    String processed_xml = writer.getBuffer().toString();

    System.out.println(processed_xml);
  }
}
Wintermute
  • 42,983
  • 5
  • 77
  • 80