Algorithm for Parsing an XML

Question

I was reading one of the interview question about parsing an XML.

I wrote a very high level and incomplete skeleton of the algorithm , and was looking for some help in writing a simple algo [I assume one exists, because this was asked as an interview question so should be doable in 45 mins I guess].

Here is my attempt:

   // Assume well-formedness
    public static Node parseXML(String xml)
    {

        Node root = new XMLParse().new Node();

        while(!helper.endOfElements())
        {
            // Returns name of root element
            root.name = helper.getName(xml);
            // Returns value of root element
            root.name = helper.getValue(xml);

            // returns each child in a String and returns all such children as 
            // a String Array
            // Basically splits based on <> </> elements and return that as a String
            String[] children = helper.getChildren(xml);

            if(children.length!=0)
            {
                root.childList = new ArrayList<XMLParse.Node>();
                for(int i=0; i<children.length;i++)
                {
                    root.childList.add(parseXML(children[i]));
                }
            }

        }


        return root;
    }


    class Node
    {
        String name;
        String value;
        List<Node> childList;

        public String getName()
        {
            return name;
        }

        public String getValue()
        {
            return value;
        }

        public List<Node> getChildList()
        {
            return childList;
        }

    }


Class helper()
{

// Returns the name of the root of the xml
public static String getName(String XML);

// Returns the value of the root of the xml
public static String getValue(String XML)

// Splits the XML into top level childern of the root of the passed XML
public static String[] getChildren(String XML)

}

I am hoping someone can give me a pseudo-code/code for doing this or may be provide an easy way of implementing the helper function in my algo.

I know there are built in classes to do this like in here , but using them would beat the purpose I guess. Also, many things used in this link are just interfaces so I couldnt find any implementation of say docBuilder.parse (new File("book.xml")) method.

Why would knowing when to use which XML parser beat the purpose? In reality nobody would ever implement their own XML parser, so if the interviewee actually knew how to use those classes and their pros/cons I'd consider this quite useful.. and as you see yourself you wouldn't have passed that test ;) So better look up that before trying to write your own parser from grounds up. — Voo, Jun 13 '11 at 00:12
I remembered writing a short writeup about the different versions in java some time ago.. [this](http://stackoverflow.com/questions/5059224/which-is-the-best-library-for-xml-parsing-in-java/5059411#5059411) could be a useful starting point. — Voo, Jun 13 '11 at 00:14

score 21 · Answer 1 · edited Apr 18 '18 at 04:55

About the actual question

This is an example of an interview question that is designed to expose the inexperience of the applicant. An applicant that hasn't been around the industry much (or who just put XML on their resume without ever having implemented a project) might eagerly go off and start churning code.

An applicant that has successfully navigated a few interviews or completed a few projects that use XML will step back and make a few observations

Writing and testing a standards compliant parser is not going to be possible in an interview setting.
- This would take weeks, months or longer. If a question is asked but the answer takes too long, the interviewer is really asking a different question
- With enough experience you can see these "trick questions" and respond accordingly (Ask for more information about what the interviewer is looking for)
The question asks for two ways to parse XML.
- Is this a coincidence? No. There are two popular techniques of processing XML.
- The question could have been "Describe two popular ways to parse XML and discuss their pros and cons" Yes that would have been clearer, but the interviewer is also testing how you respond to situations where the requirements are unclear.

About the solution

In an interview setting I would answer something along the lines of:

I don't think it will be productive to write an actual parser here during the interview. This is because standards compliant parsers are time consuming to write and validate and there are already plenty of implementations for different languages. However, I will describe how to use two popular parsers and how I have used them on past projects. I'll also discuss the pros and cons of each approach.

The first approach that can be used when parsing XML is to treat the entire document as a tree of nodes. This type of parser is called a DOM parser. I used a DOM parser on {insert relevant project experience}. We used a DOM parser because we needed to access different parts of the document at the same time.

The second approach that can be used when parsing XML is to treat the document as a stream of events or nodes. This type of approach is called a SAX parser, I used a SAX parser on {insert relevant project experience}. We used a SAX parser because we couldn't fit the entire document in memory.

{insert discussion on pros and cons}

Further reading

http://www.cs.nmsu.edu/~epontell/courses/XML/material/xmlparsers.html

+1 for analyzing the purpose of interview questions. I think any questions that involve parsing in general are either obviously difficult, or more difficult than they appear. Only thing your answer lacks is perhaps a bit of discussion on how one might theoretically begin implementing a parser. This will show you have knowledge of parsing / compilers which your employer might find impressive (if you actually do have knowledge that is). — yarian, Jun 28 '11 at 00:16
@YGomez Thank you for the compliment. I thought about including some basics of writing a parser, but felt within the context (of a Salesforce.com interview question) that it was over kill. Maybe if it was a Microsoft, Facebook, or Google interview question... (Where "hard" CS problems are a part of the job) I have found that matching the detail to the audience is sometimes as important as the actual answer. — ccozad, Jun 28 '11 at 04:00
/YGomez: are you claiming that salesforce.com has less caliber than MS, FB, Google ? — codeObserver, Jun 30 '11 at 08:12
I am claiming that the scope and scale of programming problems is not as theoretical as the other companies. Microsoft has Windows, office, SQL server, etc. FB has to deal with a massive social graph. Google has to deal with indexing the entire internet. — ccozad, Jun 30 '11 at 17:12

Algorithm for Parsing an XML

1 Answers1