Java Replace words within xml

Question

I have the following xml

<some tag>
    <some_nested_tag attr="Hello"> Text </some_nested_tag>
    Hello world Hello Programming
</some tag>

From the above xml, I want to replace the occurances of the word "Hello" which are part of the tag content but not part of tag attribute.

I want the following output (Replacing Hello by HI):

<some tag>
    <some_nested_tag attr="Hello"> Text </some_nested_tag>
    HI world HI Programming
</some tag>

I tried java regex and also some of the DOM parser tutorials, but without any luck. I am posting here for help as I have limited time available to fix this in my project. Help would be appreciated.

Regex is NOT the way to do this. Better to parse it and modify the tag content. — duffymo, May 07 '16 at 21:27

Lucas Araujo · Accepted Answer · 2016-05-07T21:50:05.423

2

That can be done by using a negative lookbehind.

Try this regex:

(?<!attr=")Hello

It will match Hello that is not preceded by attr=.

So you could try this:

str = str.replaceAll("(?<!attr=")Hello", "Hi");

It can also be done by negative lookahead:

Hello(?!([^<]+)?>)

edited May 07 '16 at 21:50

answered May 07 '16 at 21:20

Lucas Araujo

1,648
16
25

Thanks for your answer, but it can be any attribute name, i had put 'attr' for example – mihir S May 07 '16 at 21:44
Updated the answer with negative lookahead. – Lucas Araujo May 07 '16 at 21:51
Thanks it worked for me. I changed the regex a bit because in our xml we also had target words in quotes like "Hello" or also space after close tag ">". This one I used in my code "(?!([^<]+)?>)\\b" + Word_to_be_replaced + "\\b" – mihir S May 08 '16 at 19:14

score 0 · Answer 2 · answered May 07 '16 at 21:32

string.replaceAll("(?i)\\shello\\s", " HI ");

Regex Explanation:

\sHello\s

Options: Case insensitive

Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»
Match the character string “Hello” literally (case insensitive) «Hello»
Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»

 hi 

Insert the character string “ HI ” literally « HI »

Regex101 Demo

score 0 · Answer 3 · edited May 23 '17 at 10:33

XSLT is a language for transforming XML documents into other XML documents. You can match all the text nodes containing 'Hello' and replace the content of those particular nodes.

A small example of using XSLT in Java:

import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;

public class TestMain {
    public static void main(String[] args) throws IOException, URISyntaxException, TransformerException {
        TransformerFactory factory = TransformerFactory.newInstance();
        Source xslt = new StreamSource(new File("transform.xslt"));
        Transformer transformer = factory.newTransformer(xslt);

        Source text = new StreamSource(new File("input.xml"));
        transformer.transform(text, new StreamResult(new File("output.xml")));
    }
}

There was a good question on replacing string using XSLT - you can find an example of XSLT template there: XSLT string replace

dimplex · Answer 4 · 2016-05-09T14:53:52.980

Here is a fully functional example using SAX parser. It is adapted to your case with minimal changes from this example

The actual replacement takes place in MyCopyHandler#endElement() and MyCopyHandler#startElement() and the XML element text content is collected in MyCopyHandler#characters(). Note the buffer maintenance too - it is important in handling mixed element content (text and child elements)

I know XSLT solution is also possible, but it is not that portable.

public class XMLReplace {

    /**
     * @param args
     * @throws SAXException
     * @throws ParserConfigurationException
     */
    public static void main(String[] args) throws Exception {

        final String str = "<root> Hello <nested attr='Hello'> Text </nested>  Hello world Hello Programming </root>";

        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser parser = spf.newSAXParser();
        XMLReader reader = parser.getXMLReader();
        reader.setErrorHandler(new MyErrorHandler());
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        PrintWriter out = new PrintWriter(baos);
        MyCopyHandler duper = new MyCopyHandler(out);
        reader.setContentHandler(duper);
        InputSource is = new InputSource(new StringReader(str));
        reader.parse(is);
        out.close();
        System.out.println(baos);
    }

}

class MyCopyHandler implements ContentHandler {
    private boolean namespaceBegin = false;

    private String currentNamespace;

    private String currentNamespaceUri;

    private Locator locator;

    private final PrintWriter out;

    private final StringBuilder buffer = new StringBuilder();

    public MyCopyHandler(PrintWriter out) {
        this.out = out;
    }

    public void setDocumentLocator(Locator locator) {
        this.locator = locator;
    }

    public void startDocument() {
    }

    public void endDocument() {
    }

    public void startPrefixMapping(String prefix, String uri) {
        namespaceBegin = true;
        currentNamespace = prefix;
        currentNamespaceUri = uri;
    }

    public void endPrefixMapping(String prefix) {
    }

    public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {

        // Flush buffer - needed in case of mixed content (text + elements)
        out.print(buffer.toString().replaceAll("Hello", "HI"));
        // Prepare to collect element text content
        this.buffer.setLength(0);

        out.print("<" + qName);
        if (namespaceBegin) {
            out.print(" xmlns:" + currentNamespace + "=\"" + currentNamespaceUri + "\"");
            namespaceBegin = false;
        }
        for (int i = 0; i < atts.getLength(); i++) {
            out.print(" " + atts.getQName(i) + "=\"" + atts.getValue(i) + "\"");
        }
        out.print(">");
    }

    public void endElement(String namespaceURI, String localName, String qName) {
        // Process text content
        out.print(buffer.toString().replaceAll("Hello", "HI"));
        out.print("</" + qName + ">");
        // Reset buffer
        buffer.setLength(0);
    }

    public void characters(char[] ch, int start, int length) {
        // Store chunk of text - parser is allowed to provide text content in chunks for performance reasons
        buffer.append(Arrays.copyOfRange(ch, start, start + length));
    }

    public void ignorableWhitespace(char[] ch, int start, int length) {
        for (int i = start; i < start + length; i++)
            out.print(ch[i]);
    }

    public void processingInstruction(String target, String data) {
        out.print("<?" + target + " " + data + "?>");
    }

    public void skippedEntity(String name) {
        out.print("&" + name + ";");
    }
}

class MyErrorHandler implements ErrorHandler {
    public void warning(SAXParseException e) throws SAXException {
        show("Warning", e);
        throw (e);
    }

    public void error(SAXParseException e) throws SAXException {
        show("Error", e);
        throw (e);
    }

    public void fatalError(SAXParseException e) throws SAXException {
        show("Fatal Error", e);
        throw (e);
    }

    private void show(String type, SAXParseException e) {
        System.out.println(type + ": " + e.getMessage());
        System.out.println("Line " + e.getLineNumber() + " Column " + e.getColumnNumber());
        System.out.println("System ID: " + e.getSystemId());
    }
}

Java Replace words within xml

4 Answers4