0

Currently, here is what I'm wanting to do:

  • Save the xml file to my computer from a url
  • Parse it and grab the information that I want (which isn't all of it)
  • Compare the parsed information to yesterdays version of the xml

So I can do multiple different things, but I want to do it the most memory efficient way as possible. I also don't want it to take forever to parse and compare the files either.

Option 1:

  • Directly parse the xml from the url and save it into an array
  • Iterate through the array and create a new xml file with only the parsed information I want doing something like this to create the new xml file.
  • Compare the two xml files
  • Write new xml file based on the differences between the xml

Option 2:

  • Download the xml file using any of these suggested methods (will this keep the xml structure?)
  • Parse the xml into an array
  • Compare the two xml files
  • Write a new xml

These are the two options I've been looking into, but I know there are more. Not sure if they are more effective, but I haven't had direct access to the internet with my computer for a few days so I can't really test them against each other. When I was able to test it awhile back, I noticed it takes awhile to parse the information directly from the website.

The xml structure looks something like this:

<Data> 
    <User>
       <ID>1</ID>
       <Name>Bob</Name>
       <Age>18</Age>
       <IsOnline>false</IsOnline>
       <Sport>Basketball</Sport>
       <GymPresence>
           <LastSeen>April 12 2013</LastSeen>
           <Picture>www.gym.com/picId=10000</Picture>
           <Weights>
               <Machine>Bench</Machine>
               <Weight>175</Weight>
               <Reps>8</Reps>
           </Weights>
       </GymPresence>
    </User>
    <User>
       <ID>2</ID>
       <Name>Joe</Name>
       <Age>23</Age>
       <IsOnline>false</IsOnline>
       <Sport>Baseball</Sport>
       <GymPresence>
           <LastSeen>April 10 2013</LastSeen>
           <Picture>www.gym.com/picId=10001</Picture>
           <Weights>
               <Machine>Bench</Machine>
               <Weight>205</Weight>
               <Reps>8</Reps>
           </Weights>
       </GymPresence>
    </User>
    ...
    ... # 3 through 124
    ...
    <User>
       <ID>125</ID>
       <Name>Amy</Name>
       <Age>17</Age>
       <IsOnline>false</IsOnline>
       <Sport>Volleyball</Sport>
       <GymPresence>
           <LastSeen>April 13 2013</LastSeen>
           <Picture>www.gym.com/picId=10124</Picture>
           <Weights>
               <Machine>Bench</Machine>
               <Weight>105</Weight>
               <Reps>5</Reps>
           </Weights>
       </GymPresence>
    </User> 
</Data>

Overall, I'm wondering what the best option is for parsing, comparing, and writing an xml file is.

When I was able to test it online, it took awhile to parse through the xml without saving it. It went considerably faster when the xml file was located on my computer. But would downloading the file preserve the xml format? Is it worth keeping the information I don't need from the xml in case I need it later on? Or would I have to parse it and write it out (which would seem like it would take longer) to keep the format?

Community
  • 1
  • 1
WilliamShatner
  • 926
  • 2
  • 12
  • 25
  • Regardless of what you are downloading, it comes down to bytes. If you're url stream is serving up bytes that end up being correct xml then that is what you will end up with (if you're reading from the stream correctly). – Sotirios Delimanolis May 13 '13 at 18:51

1 Answers1

1

When comparing things like XML or JSON or any other serialization format, you are more concerned with the data than the binary content. What I mean is that

<Reps>8</Reps>

is equivalent to

<Reps       >8</Reps>

My suggestion is to download the XML file, use a library like JAXB to parse and convert (keyword: unmarshal) the contents of the file to a Java object (or list/set). Do the same with your previous version of the file. Then compare the java objects. With Sets, you can calculate the difference between the two and thus create a new file containing only the differences (keyword: marshal).

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
  • Thank you for the suggestion. It definitely cleared up whether I should download the file or not. In the case that I want to view the file in VIM or whatever other viewer later on, is there a way to download it with the proper formatting/binary content? – WilliamShatner May 13 '13 at 19:24
  • The [question you linked](http://stackoverflow.com/questions/921262/how-to-download-and-save-a-file-from-internet-using-java) does that. I don't know what is worrying you that it won't be the correct binary content. – Sotirios Delimanolis May 13 '13 at 19:25
  • Perhaps it just shows up oddly when viewing it from notepad. It looked much cleaner on the website than notepad. I'm not too worried about it, it wasn't a necessity. Thanks again! – WilliamShatner May 13 '13 at 19:30