1

I have the following xml

<some tag>
    <some_nested_tag attr="Hello"> Text </some_nested_tag>
    Hello world Hello Programming
</some tag>

From the above xml, I want to replace the occurances of the word "Hello" which are part of the tag content but not part of tag attribute.

I want the following output (Replacing Hello by HI):

<some tag>
    <some_nested_tag attr="Hello"> Text </some_nested_tag>
    HI world HI Programming
</some tag>

I tried java regex and also some of the DOM parser tutorials, but without any luck. I am posting here for help as I have limited time available to fix this in my project. Help would be appreciated.

Manfred Radlwimmer
  • 13,257
  • 13
  • 53
  • 62
mihir S
  • 617
  • 3
  • 8
  • 23

4 Answers4

2

That can be done by using a negative lookbehind.

Try this regex:

(?<!attr=")Hello

It will match Hello that is not preceded by attr=.

So you could try this:

str = str.replaceAll("(?<!attr=")Hello", "Hi");

It can also be done by negative lookahead:

Hello(?!([^<]+)?>)
Lucas Araujo
  • 1,648
  • 16
  • 25
  • Thanks for your answer, but it can be any attribute name, i had put 'attr' for example – mihir S May 07 '16 at 21:44
  • Updated the answer with negative lookahead. – Lucas Araujo May 07 '16 at 21:51
  • Thanks it worked for me. I changed the regex a bit because in our xml we also had target words in quotes like "Hello" or also space after close tag ">". This one I used in my code "(?!([^<]+)?>)\\b" + Word_to_be_replaced + "\\b" – mihir S May 08 '16 at 19:14
0
string.replaceAll("(?i)\\shello\\s", " HI ");

Regex Explanation:

\sHello\s

Options: Case insensitive

Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»
Match the character string “Hello” literally (case insensitive) «Hello»
Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»

 hi 

Insert the character string “ HI ” literally « HI »

Regex101 Demo

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
0

XSLT is a language for transforming XML documents into other XML documents. You can match all the text nodes containing 'Hello' and replace the content of those particular nodes.

A small example of using XSLT in Java:

import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;

public class TestMain {
    public static void main(String[] args) throws IOException, URISyntaxException, TransformerException {
        TransformerFactory factory = TransformerFactory.newInstance();
        Source xslt = new StreamSource(new File("transform.xslt"));
        Transformer transformer = factory.newTransformer(xslt);

        Source text = new StreamSource(new File("input.xml"));
        transformer.transform(text, new StreamResult(new File("output.xml")));
    }
}

There was a good question on replacing string using XSLT - you can find an example of XSLT template there: XSLT string replace

Community
  • 1
  • 1
bedrin
  • 4,458
  • 32
  • 53
0

Here is a fully functional example using SAX parser. It is adapted to your case with minimal changes from this example

The actual replacement takes place in MyCopyHandler#endElement() and MyCopyHandler#startElement() and the XML element text content is collected in MyCopyHandler#characters(). Note the buffer maintenance too - it is important in handling mixed element content (text and child elements)

I know XSLT solution is also possible, but it is not that portable.

public class XMLReplace {

    /**
     * @param args
     * @throws SAXException
     * @throws ParserConfigurationException
     */
    public static void main(String[] args) throws Exception {

        final String str = "<root> Hello <nested attr='Hello'> Text </nested>  Hello world Hello Programming </root>";

        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser parser = spf.newSAXParser();
        XMLReader reader = parser.getXMLReader();
        reader.setErrorHandler(new MyErrorHandler());
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        PrintWriter out = new PrintWriter(baos);
        MyCopyHandler duper = new MyCopyHandler(out);
        reader.setContentHandler(duper);
        InputSource is = new InputSource(new StringReader(str));
        reader.parse(is);
        out.close();
        System.out.println(baos);
    }

}

class MyCopyHandler implements ContentHandler {
    private boolean namespaceBegin = false;

    private String currentNamespace;

    private String currentNamespaceUri;

    private Locator locator;

    private final PrintWriter out;

    private final StringBuilder buffer = new StringBuilder();

    public MyCopyHandler(PrintWriter out) {
        this.out = out;
    }

    public void setDocumentLocator(Locator locator) {
        this.locator = locator;
    }

    public void startDocument() {
    }

    public void endDocument() {
    }

    public void startPrefixMapping(String prefix, String uri) {
        namespaceBegin = true;
        currentNamespace = prefix;
        currentNamespaceUri = uri;
    }

    public void endPrefixMapping(String prefix) {
    }

    public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {

        // Flush buffer - needed in case of mixed content (text + elements)
        out.print(buffer.toString().replaceAll("Hello", "HI"));
        // Prepare to collect element text content
        this.buffer.setLength(0);

        out.print("<" + qName);
        if (namespaceBegin) {
            out.print(" xmlns:" + currentNamespace + "=\"" + currentNamespaceUri + "\"");
            namespaceBegin = false;
        }
        for (int i = 0; i < atts.getLength(); i++) {
            out.print(" " + atts.getQName(i) + "=\"" + atts.getValue(i) + "\"");
        }
        out.print(">");
    }

    public void endElement(String namespaceURI, String localName, String qName) {
        // Process text content
        out.print(buffer.toString().replaceAll("Hello", "HI"));
        out.print("</" + qName + ">");
        // Reset buffer
        buffer.setLength(0);
    }

    public void characters(char[] ch, int start, int length) {
        // Store chunk of text - parser is allowed to provide text content in chunks for performance reasons
        buffer.append(Arrays.copyOfRange(ch, start, start + length));
    }

    public void ignorableWhitespace(char[] ch, int start, int length) {
        for (int i = start; i < start + length; i++)
            out.print(ch[i]);
    }

    public void processingInstruction(String target, String data) {
        out.print("<?" + target + " " + data + "?>");
    }

    public void skippedEntity(String name) {
        out.print("&" + name + ";");
    }
}

class MyErrorHandler implements ErrorHandler {
    public void warning(SAXParseException e) throws SAXException {
        show("Warning", e);
        throw (e);
    }

    public void error(SAXParseException e) throws SAXException {
        show("Error", e);
        throw (e);
    }

    public void fatalError(SAXParseException e) throws SAXException {
        show("Fatal Error", e);
        throw (e);
    }

    private void show(String type, SAXParseException e) {
        System.out.println(type + ": " + e.getMessage());
        System.out.println("Line " + e.getLineNumber() + " Column " + e.getColumnNumber());
        System.out.println("System ID: " + e.getSystemId());
    }
}
dimplex
  • 494
  • 5
  • 9