I am very new to both Hadoop and Pig. I have been able to do a number of simple programs but one which is taxing me is processing XML when part of an XML file is malformed.
I can use XMLLoader('tag') to get all of the tags from an xml file which is great. However if one is missing a well formed close tag pig will stop at that one. for example
<tag>
</tag>
<tag>
</tag1>
<tag>
</tag>
This will only pick up the first valid tag. Now, I have experience with JAQL and am able to ignore the error record so that the application picks up the second tag.
My question is: is their was a way to do handle poor formatting of XML using Pig, rather than JAQL?