I have recently been tasked with creating a scripted solution to create resource Data Capture records from an online XML feed.
This is not something that I have done before and would be grateful if anyone could offer any keys points that I should be aware of, any background reading that i could have a look at or any other issues or 'pitfalls' that I should take into consideration when doing this. Terminology that may be specific to this type of task would also be a big help.
Ideally I would like to achieve this using JQuery, or if it would be an easier task to complete, use Perl. My JQuery knowledge is better than my Perl knowledge though.
My aim is to take a very large XML feed from online that comprises of multiple node elements consisting of a variety of content. An example of the XML is below.
<response>
<result name="response" numFound="3559" start="0">
<doc>
<str name="PID">islandora:4466</str>
<arr name="dc.coverage">
<str>4466</str>
</arr>
<arr name="dc.description">
<str>
Text
</str>
<str>
<p><iframe src="http:" width="230" height="230" frameborder="0" allowtransparency="65535" scrolling="auto"></iframe></p>
<p><a href="/assets/.....">Transcript (DOC, 150KB) </a></p>
</str>
</arr>
<arr name="dc.identifier">
<str>islandora:4466</str>
</arr>
<arr name="dc.subject">
<str>heav422</str>
<str>heav533</str>
<str>heav547</str>
<str>heav549</str>
<str>discipline1137</str>
<str>theme778</str>
</arr>
<str name="dc.title">Text</str>
<arr name="hea.abstract">
<str> <!-- HTML ready content (example below) -->
<p>Text</p>
<ul>
<li>Text</li>
<li>Text</li>
<li>Text</li>
<li>Text</li>
<li>Text</li>
<li>Text</li>
<li>Text</li>
</ul>
<p>Text</p>
</str>
</arr>
<arr name="hea.date">
<str>2012-05-01 00:00:00</str>
</arr>
<arr name="hea.discipline">
<str>1137</str>
</arr>
<arr name="hea.heav">
<str>422</str>
<str>533</str>
<str>547</str>
<str>549</str>
</arr>
<str name="hea.resource_type">808</str>
<arr name="hea.theme">
<str>778</str>
</arr>
<arr name="hea.title">
<str>Text</str>
</arr>
<date name="timestamp">2013-11-07T08:12:22.684Z</date>
</doc>
</result>
</response>
Ideally i would like to develop something that would allow me to break the initial large XML into individual XML files for use as data capture records.
My initial thinking behind this is that i could JQuery's $.parseXML to seperate the initial XML into the individual records and then save each as an individual .XML file before putting them into my work CMS and converting them to DCRs (using the functionality of the CMS).
I have done some online looking and there seems to be lots of more complicated ways of doing this and ideally I would be grateful for any guidance as to how to do this.
This is the first time I will have attempted anything like this, and have a deadline that takes this into account. So ideally if anyone could suggest any, hints tips or extra reading then I would appreciate it. This is my initial research stage so as of yet I have not started trying to put together a solution.
If I have missed anything that you would like to know to better advise, please ask and I will endeavor to post the answer ASAP.
Thank you for having a look and any advise that is given.
**Curious to know why this had been marked down without any comment as to why?
Dan