1

I am currently working on a python project and stuck in one little problem related to comparison of two XML files using python. Now for instance assume that we have two xml files:

A file:

<m1:time timeinterval="5">
   <m1:vehicle distance="40" speed="5"\>

   <m1:location hours = "1" path = '1'\>
      <m1:feature color="2" type="a">564</m1:feature>
      <m1:feature color="3" type="b">570</m1:feature>
      <m1:feature color="4" type="c">570</m1:feature>
   <\m1:location>

   <m1:location hours = "5" path = '1'\>
      <m1:feature color="6" type="a">560</m1:feature>
      <m1:feature color="7" type="b">570</m1:feature>
      <m1:feature color="8" type="c">580</m1:feature>   
   <\m1:location>

   <m1:location hours = "9" path = '1'\>
      <m1:feature color="10" type="a">560</m1:feature>
      <m1:feature color="11" type="b">570</m1:feature>
      <m1:feature color="12" type="c">580</m1:feature>   
   <\m1:location>
</m1:time>

B file:

<m1:time timeinterval="6">
   <m1:vehicle distance="40" speed="5"\>

   <m1:location hours = "5" path = '1'\>
      <m1:feature color="6" type="a">560</m1:feature>
      <m1:feature color="7" type="b">570</m1:feature>
      <m1:feature color="8" type="c">580</m1:feature>   
   <\m1:location>

   <m1:location hours = "1" path = '1'\>
      <m1:feature color="2" type="a">564</m1:feature>
      <m1:feature color="3" type="b">570</m1:feature>
      <m1:feature color="4" type="c">570</m1:feature>
   <\m1:location>

   <m1:location hours = "9" path = '1'\>
      <m1:feature color="10" type="a">560</m1:feature>
      <m1:feature color="11" type="b">570</m1:feature>
      <m1:feature color="12" type="c">580</m1:feature>   
   <\m1:location>

</m1:time>
  • The thing which i want to ask is how to compare A file with B file making sure that though the order of "location" element is different in both the files, still they are shown same using python?
  • I have tried all kinds of approach and also tried referring to this question, but in this project i want to develop an approach of my own and I cant use any already available tools.

The approach which I have tried so far is:

I am working with LXML and I am getting the individual attributes of children from A file and storing them in list. then I am comparing B file's elements and children attributes with the values stored in that list.

First all, this approach is not working and neither I am able to think of any efficient procedure to accomplish this task. Can you guys shed some light over this?

Thank you.

Community
  • 1
  • 1
Radheya
  • 779
  • 1
  • 11
  • 41

1 Answers1

1

Sounds like you need some XML parser. My first suggestion would be to use a DOM parser (or create a very basic one yourself). By reading both XML files and then comparing the trees you can easily verify if they are the same.

This is not very efficient though. It is possible to do the verification when reading the second XML file. You would then however have to remove the elements that match. (To make sure that no unmatched elements are left behind)

But I am curious why your list approach isn't working. Can you give some more information about this?

cvesters
  • 678
  • 5
  • 14
  • Basically the problem is I am having two huge xml files for comparision, probably 5 to 6 MBs. Now since the data is not always fixed in my case I want to dynamically generate lists. for instance in above eg. i would like to generate dynamic lists like location_hours_1=[], location_hours_1=[], location_hours_9=[]. After these lists are created i can individually compare other file contents with this list. I followed the approach [here](http://stackoverflow.com/questions/18098326/dynamically-declare-create-lists-in-python) for dynamic generation but its showing some error in my case. – Radheya Jun 29 '15 at 14:15
  • I think such a method becomes very hard to understand with deep element structures, making it very error prone. I really suggest making a small tree that can be compared. If you need some help doing so, you can e-mail me. – cvesters Jun 29 '15 at 14:38
  • Hey, thank you for your feedback. Yes currently I am working on it. I know my dynamic list method is very error prone and not efficient at all. So, some other approach is worth trying. btw how can i find your email address? there is no email id mentioned on your page. – Radheya Jun 29 '15 at 14:50