0

Say, I have a String:

String someString = "<html><body><div><div><div class="unknown"><b>Content</b></div></div></div></body></html>";

In this String the position of the "Content" is known.

Now, I want to turn the most inner divs into span tags. So what I want to do:

someString.replacePreviousOccurrence(someString.indexOf("Content"), "<div ", "<span>");
someString.replaceNextOccurrence(someString.indexOf("Content"), "</div>", "</span>");

Is there something in Java to do this? Or just to get the index of a previous and next occurrence of a substring from a specified index?

Edit: forgot to specify the divs have unknown tags (may have classes and stuff) and there may be stuff in between (like the tag in the example).

Simon Baars
  • 1,877
  • 21
  • 38

2 Answers2

1

You can definitely do this with regex, though it may not be the most elegant solution. Here is the pattern you might use: <div>(?!<div>).*(?<!<\/div>)<\/div>

This works by using negative lookahead and negative lookbehind. Negative lookahead here: (?!<div>) says find this pattern where this is not followed by "<div>" and the negative lookbehind here: (?<!<\/div>) says find this pattern where it is not preceded by </div>

So the pattern broken down:

<div>   //matches <div>
    (?!<div>) //that isn't followed by <div>
           .* //followed by any character any number of times
    (?<!<\/div>) // Where the next match isn't preceded by <div>
<\/div>    //matches </div>

So for this problem you can do something like the following:

String str = "<html><body><div><div><div class="unknown"><b>Content</b></div></div></div></body></html>";
Pattern p = "<div>(?!<div>).*(?<!<\/div>)<\/div>";
Matcher m = p.matcher(str);
String output = m.replaceAll("<div>", "<span>").replaceAll("</div>", "</span>");
gwcoderguy
  • 392
  • 1
  • 13
  • A great solution for my problem. However, I find it strange that not even any apache library contains an "replacePreviousOccurrence" and "replaceNextOccurrence" method. I don't understand why Java would give you methods like indexOf and lastIndexOf to find the first and last index of a substring, but none for all in between. – Simon Baars Apr 18 '17 at 17:41
  • 1
    Here's an interesting approach you can try: http://stackoverflow.com/questions/19035893/finding-second-occurrence-of-a-substring-in-a-string-in-java Basically, you can utilize the indexOf() method utilizing the index of where you'd like to begin your search. You can use this to get both the previous and next occurrences. Though I agree it would be a nice function for them to include! – gwcoderguy Apr 18 '17 at 18:31
  • Looks awesome @gwcoderguy. I did know of that function, but didn't see how to get the previous occurrence. Would you mind explaining how? – Simon Baars Apr 18 '17 at 18:53
  • 1
    Sorry, I was hoping that there would be a method with a toIndex parameter... Either way you can do something like: `str.substring(0, str.indexOf(targetString) - targetString.length).lastIndexOf(targetString);` A little ugly though... – gwcoderguy Apr 18 '17 at 20:38
  • Still a very stable alternative. Thanks for these insights. – Simon Baars Apr 18 '17 at 20:46
1

You could use the built-in functionality for working with xml.

This is however, sadly, very verbose -but works.

 public static void replaceDivWithSpamByText() throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, TransformerException {
        String html = "<html><body><div><div><div>Content</div></div></div></body></html>";
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(new ByteArrayInputStream(html.getBytes(StandardCharsets.UTF_8)));

        XPathFactory xPathFactory = XPathFactory.newInstance();
        XPath xpath = xPathFactory.newXPath();
        Node contentNode = (Node) xpath.evaluate(".//div[text() = 'Content']", doc, XPathConstants.NODE);
        doc.renameNode(contentNode, null, "span");


        DOMSource domSource = new DOMSource(doc);
        StringWriter writer = new StringWriter();
        StreamResult result = new StreamResult(writer);
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer transformer = tf.newTransformer();
        transformer.transform(domSource, result);

        System.out.println(writer.toString()); 
    }

Note that in this example I use Xpath to select the node by text(".//div[text() = 'Content']"), selecting by id, class, or other attributes is very easy. But writing a generic class to handle this could be a good idea if you're doing this kind of replacements a lot.

Raudbjorn
  • 450
  • 2
  • 8
  • For this issue, this solves my problem. However, I find it strange that not even any apache library contains an "replacePreviousOccurrence" and "replaceNextOccurrence" method. I don't understand why Java would give you methods like indexOf and lastIndexOf to find the first and last index of a substring, but none for all in between. – Simon Baars Apr 18 '17 at 17:38