I'm new with java, and I want an opinion for the community. I Have a huge XML, that contains a lot of information. Actually, this XML has approximately 140Mb of information. In this XML I have a lot of information that is no more valid, so I need to do filter and use only the valid one, to check this I need to cross information between node, to check if deletion is needed or not. In some cases, the entire father(main) node needs to be deleted.
I'm already doing it with dom parse, using loops, inside the loops I save in variables and cross the information to check, and delete the actual node or the entire father node.
Basically, the structure is like this:
<source>
<main>
<id>98567</id>
<block_information>
<name>Block A</name>
<start_date>20120210</start_date>
<end_date>20150210</end_date>
</block_information>
<block_information>
<name>Block A.01</name>
<start_date>20150210</start_date>
<end_date>20251005</end_date>
</block_information>
<city_information>
<name>Manchester</name>
<start_date>20150210</start_date>
<end_date>20150212</end_date>
</city_information>
<city_information>
<name>New Manchester</name>
<start_date>20150212</start_date>
<end_date>20251005</end_date>
</city_information>
<phone>
<type>C</type>
<number>987466321</number>
<name></name>
</phone>
<phone>
<type>P</type>
<number>36547821</number>
<name></name>
</phone>
</main>
<main>
<id>19587</id>
<block_information>
<name>Che</name>
<start_date>20090210</start_date>
<end_date>20100210</end_date>
</block_information>
<block_information>
<name></name>
<start_date>20100210</start_date>
<end_date>20351005</end_date>
</block_information>
<city_information>
<name></name>
<start_date>20150210</start_date>
<end_date>20150212</end_date>
</city_information>
<city_information>
<name>No Name</name>
<start_date>20150212</start_date>
<end_date>20191005</end_date>
</city_information>
<phone>
<type>C</type>
<number>987466321</number>
<name>Mom</name>
</phone>
<phone>
<type>P</type>
<number>36547821</number>
<name></name>
</phone>
</main>
</source>
The output is like this:
<result>
<main>
<id>98567</id>
<block_name>Block A.01</block_name>
<city_name>New Manchester</city_name>
<cellphone></cellphone>
<phone>36547821</phone>
<contact_phone></contact_phone>
<contact_phone_name></contact_phone_name>
</main>
</result>
For the information go out in result, is mandatory that there is one <block_information>
and <city_information>
valid (<start_date>
less than actual date and <end_date>
bigger than actual date), and the <name...>
is needed for both.
If there is none, or more than one valid, the <main>
will be deleted.
For the phone number, <type>
['C' is for contact, 'P' for personal phone, 'M' for mobile]. So if the <type>
is 'C' but there is no value in <name>
the phone do not go to result. 'P' go to <phone>
and 'M' go to <cellphone>
.
I want your considerations on what is the best way to do that in the most performative way, and to anyone can do adjustment before in an easy way if it's needed.
thanks in advance for the inputs!
as asked by @kjhughes, I put some values on the sample XML, and some filters that I need to do. Thanks!
ps.: the XML structure used as an example is TOO simple compared to the actual one, there are a lot more complex types.