I'm trying to figure out the best way to accomplish the following:
- Download a large XML (1GB) file on daily basis from a third-party website
- Convert that XML file to relational database on my server
- Add functionality to search the database
For the first part, is this something that would need to be done manually, or could it be accomplished with a cron?
Most of the questions and answers related to XML and relational databases refer to Python or PHP. Could this be done with javascript/nodejs as well?
If this question is better suited for a different StackExchange forum, please let me know and I will move it there instead.
Below is a sample of the xml code:
<case-file>
<serial-number>123456789</serial-number>
<transaction-date>20150101</transaction-date>
<case-file-header>
<filing-date>20140101</filing-date>
</case-file-header>
<case-file-statements>
<case-file-statement>
<code>AQ123</code>
<text>Case file statement text</text>
</case-file-statement>
<case-file-statement>
<code>BC345</code>
<text>Case file statement text</text>
</case-file-statement>
</case-file-statements>
<classifications>
<classification>
<international-code-total-no>1</international-code-total-no>
<primary-code>025</primary-code>
</classification>
</classifications>
</case-file>
Here's some more information about how these files will be used:
All XML files will be in the same format. There are probably a few dozen elements within each record. The files are updated by a third party on a daily basis (and are available as zipped files on the third-party website). Each day's file represents new case files as well as updated case files.
The goal is to allow a user to search for information and organize those search results on the page (or in a generated pdf/excel file). For example, a user might want to see all case files that include a particular word within the <text>
element. Or a user might want to see all case files that include primary code 025 (<primary-code>
element) and that were filed after a particular date (<filing-date>
element).
The only data entered into the database will be from the XML files--users won't be adding any of their own information to the database.