0

Want to Achive:

Get an unknown XML file's Elements (Element Name, How many elements are there in the xml file).

Then get all the attributes and their name and values to use it later (eg Comparison to other xml file)

element_vs_attribute

Researched: 1. 2. 3. 4. 5. And many more

Does Anyone have any idea for this?

I dont want to pre define more then 500 table like in the previous code snippet, somehow i should be able to get the number of elements and the element names itself dynamically.

EDIT!

Example1
<Root Attri1="" Attri2="">
    <element1 EAttri1="" EAttri2=""/>
    <Element2 EAttri1="" EAttri2="">
        <nestedelement3 NEAttri1="" NEAttri2=""/>
    </Element2> 
</Root>

Example2
<Root Attri1="" Attri2="" Attr="" At="">
    <element1 EAttri1="" EAttri2="">
        <nestedElement2 EAttri1="" EAttri2="">
            <nestedelement3 NEAttri1="" NEAttri2=""/>
        </nestedElement2>
    </element1> 
</Root>

Program Snipet:

String Example1[] = {"element1","Element2","nestedelement3"};
String Example2[] = {"element1","nestedElement2","nestedelement3"};


for(int i=0;i<Example1.length;++){
    NodeList Elements = oldDOC.getElementsByTagName(Example1[i]);
    for(int j=0;j<Elements.getLength();j++) {
        Node nodeinfo=Elements.item(j);
        for(int l=0;l<nodeinfo.getAttributes().getLength();l++) {
        .....
    }
}

Output: The expected result is to get all the Element and all the Attributes out from the XML file without pre defining anything.

eg:

Elements: element1 Element2 nestedelement3

Attributes:  Attri1 Attri2 EAttri1 EAttri2 EAttri1 EAttri2 NEAttri1 NEAttri2
  • You can use JAX-B, I think... I am not 100% sure. – Amit May 02 '18 at 10:55
  • You don't need a code snippet, you need a book. I recommend Elliot Rusty Harold's book on XML processing in Java. I'm going to close this as off-topic, I'm afraid, because it's a "technology selection" question, and that's off-topic. But if you search for "XML parsing in Java" you'll find lots of ideas. – Michael Kay May 02 '18 at 11:04
  • How this can be an off-topic? Currently im using domparser to solve this and the best idea i have is multiple for loops after i pre define the Elements... what im looking for is a universal and easier way... if thats off-topic then im pretty much surprised about your opinion – Szilágyi István May 02 '18 at 11:06
  • To turn it into a legitimate programming question you need to supply specific information about your inputs, your desired outputs, the code you have written so far, and the way in which your existing attempts fail to solve the problem. – Michael Kay May 02 '18 at 11:27
  • There you go, hope it helps to understand what i want – Szilágyi István May 02 '18 at 11:59
  • what is the expected output? you want to unmarshall into pojo? you want to collect all attributes? retain association of element to attributes? what?? – Sharon Ben Asher May 02 '18 at 12:40
  • Its written above the EDIT section i want to get the Elements, Element names and their number. That would solve my problem. Do i suck at asking questions? – Szilágyi István May 02 '18 at 12:50
  • 1) I do not see any such text in the question, above or below the edit section. 2) what do you mean "Elements"? entire xml element including attrs and text? and why your code shows a loop on attributes? – Sharon Ben Asher May 02 '18 at 13:44
  • we can keep this ping pong on and on. if you want help you need to describe the desired result data structure or somehow make it clear what is the intended output. collecting the info piece by piece will make me tire and leave – Sharon Ben Asher May 02 '18 at 13:47
  • Hi, i added some more text and made the question to a single point. i want to get everything out from an XML starting with Elements which is the main part of an XML file. Once i have the Elements name and number i can dynamically get the Attributes and their name and values for later use (can be anything eg compare to other xml file). – Szilágyi István May 03 '18 at 05:52

1 Answers1

1

The right tool for this job is xpath It allows you to collect all or some elements and attributes based on various criteria. It is the closest you will get to a "universal" xml parser.

Here is the solution that I came up with. The solution first finds all element names in the given xml doc, then for each element, it counts the element's occurrences, then collect it all to a map. same for attributes.
I added inline comments and method/variable names should be self explanatory.

import java.io.*;
import java.nio.file.*;
import java.util.*;
import java.util.function.*;
import java.util.stream.*;

import org.w3c.dom.*;

import javax.xml.parsers.*;
import javax.xml.xpath.*;

public class TestXpath
{

    public static void main(String[] args) {

        XPath xPath = XPathFactory.newInstance().newXPath();

        try (InputStream is = Files.newInputStream(Paths.get("C://temp/test.xml"))) {
            // parse file into xml doc
            DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document xmlDocument = builder.parse(is);

            // find all element names in xml doc
            Set<String> allElementNames = findNames(xmlDocument, xPath.compile("//*[name()]"));
            // for each name, count occurrences, and collect to map
            Map<String, Integer> elementsAndOccurrences = allElementNames.stream()
                .collect(Collectors.toMap(Function.identity(), name -> countElementOccurrences(xmlDocument, name)));
            System.out.println(elementsAndOccurrences);

            // find all attribute names in xml doc
            Set<String> allAttributeNames = findNames(xmlDocument, xPath.compile("//@*"));
            // for each name, count occurrences, and collect to map
            Map<String, Integer> attributesAndOccurrences = allAttributeNames.stream()
                .collect(Collectors.toMap(Function.identity(), name -> countAttributeOccurrences(xmlDocument, name)));
            System.out.println(attributesAndOccurrences);

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static Set<String> findNames(Document xmlDoc, XPathExpression xpathExpr) {
        try {
            NodeList nodeList = (NodeList)xpathExpr.evaluate(xmlDoc, XPathConstants.NODESET);
            // convert nodeList to set of node names
            return IntStream.range(0, nodeList.getLength())
                .mapToObj(i -> nodeList.item(i).getNodeName())
                .collect(Collectors.toSet());
        } catch (XPathExpressionException e) {
            e.printStackTrace();
        }
        return new HashSet<>();
    }

    public static int countElementOccurrences(Document xmlDoc, String elementName) {
        return countOccurrences(xmlDoc, elementName, "count(//*[name()='" + elementName + "'])");
    }

    public static int countAttributeOccurrences(Document xmlDoc, String attributeName) {
        return countOccurrences(xmlDoc, attributeName, "count(//@*[name()='" + attributeName + "'])");
    }

    public static int countOccurrences(Document xmlDoc, String name, String xpathExpr) {
        XPath xPath = XPathFactory.newInstance().newXPath();
        try {
            Number count = (Number)xPath.compile(xpathExpr).evaluate(xmlDoc, XPathConstants.NUMBER);
            return count.intValue();
        } catch (XPathExpressionException e) {
            e.printStackTrace();
        }
        return 0;
    }
}
Sharon Ben Asher
  • 13,849
  • 5
  • 33
  • 47