1

I have two XML, before and after the user has edited them. I need to check that user have only added new elements but have not deleted or changed old ones.

Can anybody suggest to me a good algorithm to do that comparison?

Ps: My XML has a very trivial schema, they only represent an object's structure (with nested objects) in a naive way. There are few allowed tags, <object> tag can only contains <name> tag, <type> tag or a <list> tag. The <name> and <type> tag can only contain a string; <list> tag instead can contain a <name> tag and a single <object> tags (representing the structure of objects in the list). The string in the <name> tag can be freely choosen, the string in <type> tag instead can be only "string" , "int" , "float" , "bool" , "date" or "composite".

Here an example :

 <object>
      <name>Person</name>
      <type>composite</type>

      <object>
            <name>Person_Name</name>
            <type>string</type>
      </object>

      <object>
            <name>Person_Surname</name>
            <type>string</type>
      </object>

      <object>
            <name>Person_Age</name>
            <type>int</type>
      </object>

      <object>
            <name>Person_Weight</name>
            <type>float</type>
      </object>

      <object>
            <name>Person_Address</name>
            <type>string</type>
      </object>

      <object>
            <name>Person_BirthDate</name>
            <type>date</type>
      </object>

      <list>
            <name>Person_PhoneNumbers</name>

            <object>
                  <name>Person_PhoneNumber</name>
                  <type>composite</type>

                  <object>
                        <name>Person_PhoneNumber_ProfileName</name>
                        <type>string</type>
                  </object>
                  <object>
                        <name>Person_PhoneNumber_CellNumber</name>
                        <type>string</type>
                  </object>
                  <object>
                        <name>Person_PhoneNumber_HomeNumber</name>
                        <type>string</type>
                  </object>
                  <object>
                        <name>Person_PhoneNumber_FaxNumber</name>
                        <type>string</type>
                  </object>
                  <object>
                        <name>Person_PhoneNumber_Mail</name>
                        <type>string</type>
                  </object>
                  <object>
                        <name>Person_PhoneNumber_Social</name>
                        <type>string</type>
                  </object>
                  <object>
                        <name>Person_PhoneNumber_IsActive</name>
                        <type>bool</type>
                  </object>
            </object>
      </list>
 </object>
Skary
  • 1,322
  • 1
  • 13
  • 40
  • How do your user's edit the XML? Why not give them an interface that only allows adding new nodes? – jac Mar 02 '15 at 23:14

2 Answers2

1

You said:

I need to check that user have only added new elements 
but have not deleted or changed old ones.

Can you be more precise about what you mean?

For example, if I insert a new "object" element somewhere, I've changed every element it's inside of, right? As many lists and other objects as contain it. In fact, any insertion at all is a change to the root element.

So, presumably you want to not count changes that change nothing but the root element. How about adding a new item to the list you show? Do you want the list to count as changed? Or what if the objects in the list, or the list itself, are moved to new places without having their content changed at all?

Each of those possibilities is pretty easy to write, but one has to decide what counts as a change first.

If, for example, you only care about bottom-level objects, and "the same" means precisely the same text content (no attributes, white-space variations, etc. etc.), then the easiest way is to load the "before" file into a list of (name,type) pairs; then load the "after" file into a similar but separate list. Sort both lists, then run down them simultaneously and report anything in the new one that's not in the old one (you'll probably want to report any deletions too, just in case).

TextGeek
  • 1,196
  • 11
  • 23
  • You are right, i was not clear about changes. What i mean is that you can't change name or type of existing objects or delete them, but you can add sub structures (nested objects). All the objects indeed are a schema of object's structures, so list not contains elements, lists only contains a definition of single object's schema on which are bound to. So for the lists the same rule of other object are applied to, you can add new structure to the object's on which lists are bound to, but you can't rename existing or delete old one. – Skary Mar 03 '15 at 05:28
  • Sorry for the double comment, but let me understand what you say. You propose to create two lists (old xml an new xml) that contains each element's hierarchy with associated its type (a very trivial example should be that the hierarchy is concatenation of ancestor's names + current element name + some special splitting char + type name), sort both lists and then check if in the second list there are all the elements of the first list ? At first sight seems to work and it's pretty simple or i miss something crucial? – Skary Mar 03 '15 at 05:39
  • That will work if it doesn't matter where an "object" occurs in the hierarchy, or if, as you mentioned, you join all the element types (like "object/list/object#composite" or something). If order matters (say I move one sibling later), then you need to keep child-numbers too: "object_1/list_1/object_3#composite". If you also don't care about nesting at all (moving an object still counts as "the same object", then you can just flatten it out completely into one array of objects. – TextGeek Mar 03 '15 at 19:11
  • No moving objects is a change in my scenario, but i do not care about ordering. So thanks you algorithm seems to be a simple and good solution. – Skary Mar 04 '15 at 06:59
1

I need to check that user have only added new elements but have not deleted or changed old ones.

You can represent your 2 XML files as objects. Traverse the nodes, get child the element count for each node and check if its child nodes exists on the other file. For comparing 2 complex objects, you can use the IEquatable.Equals() interface method. Read it here.

The code below doesn't care about the structure of your XML document or on which position a particular element exists since each element is represented as an XElement object. All it knows is 1.) the name of the element, 2.) that each element has children or not, 3.) has attributes or not, 4.) has innerxml or not, etc. If you want to be strict about the structure of your XML, you can represent each level as a single class.

public class Program
{

    static void Main(string[] args)
    {
        XDocument xdoc1 = XDocument.Load("file1.xml");
        XDocument xdoc2 = XDocument.Load("file2.xml");

        RootElement file1 = new RootElement(xdoc1.Elements().First());
        RootElement file2 = new RootElement(xdoc2.Elements().First());

        bool isEqual = file1.Equals(file2);

        Console.ReadLine();
    }
}
public abstract class ElementBase<T>
{
    public string Name;
    public List<T> ChildElements;

    public ElementBase(XElement xElement)
    {

    }
}

public class RootElement : ElementBase<ChildElement>, IEquatable<RootElement>
{
    public RootElement(XElement xElement)
        : base(xElement)
    {
        ChildElements = new List<ChildElement>();
        Name = xElement.Name.ToString();

        foreach (XElement e in xElement.Elements())
        {
            ChildElements.Add(new ChildElement(e));
        }
    }

    public bool Equals(RootElement other)
    {
        bool flag = true;

        if (this.ChildElements.Count != other.ChildElements.Count())
        {
            //--Your error handling logic here
            flag = false;
        }

        List<ChildElement> otherChildElements = other.ChildElements;
        foreach (ChildElement c in this.ChildElements)
        {
            ChildElement otherElement = otherChildElements.FirstOrDefault(x => x.Name == c.Name);

            if (otherElement == null)
            {
                //--Your error handling logic here
                flag = false;
            }
            else
            {
                flag = c.Equals(otherElement) == false ? false : flag;
            }
        }

        return flag;
    }
}

public class ChildElement : ElementBase<ChildElement>, IEquatable<ChildElement>
{
    public ChildElement(XElement xElement)
        : base(xElement)
    {
        ChildElements = new List<ChildElement>();
        Name = xElement.Name.ToString();

        foreach (XElement e in xElement.Elements())
        {
            ChildElements.Add(new ChildElement(e));
        }
    }

    public bool Equals(ChildElement other)
    {
        bool flag = true;

        if (this.ChildElements.Count != other.ChildElements.Count())
        {
            //--Your error handling logic here
            flag = false;
        }

        List<ChildElement> otherList = other.ChildElements;

        foreach (ChildElement e in this.ChildElements)
        {
            ChildElement otherElement = otherList.FirstOrDefault(x => x.Name == e.Name);

            if (otherElement == null)
            {
                //--Your error handling logic here
                flag = false;
            }

            else
            {
                flag = e.Equals(otherElement) == false ? false : flag;
            }
        }

        return flag;
    }
}

If you also want to check for attributes or innerxml, you can do like so.

public List<XAttribute> ElementAttributes = new List<XAttribute>();
    foreach (XAttribute attr in xElement.Attributes())
                {
                    ElementAttributes.Add(attr);
                }

List<XAttribute> otherAttributes = other.ElementAttributes;
                foreach (XAttribute attr in ElementAttributes)
                {
                    XAttribute otherAttribute = otherAttributes.FirstOrDefault(x => x.Name == attr.Name);

                    if (otherAttribute == null)
                    {
                        //--Your error handling logic here

                        flag = false;
                    }

                    else
                    {
                        if (otherAttribute.Value != attr.Value)
                        {
                            //--Your error handling logic here

                            flag = false;
                        }
                    }
                }
Community
  • 1
  • 1
jmc
  • 1,649
  • 6
  • 26
  • 47