I have a bunch of XML text that I need to iterate over and extract some data. I know Regex is not the best way to go about it but the data I need to extract is minimal and I was successfully able to do it through Regex. The issue I am facing is I need that data to appear in order. The data below is what I am extracting info from but I need to do it paragraph wise so need to iterate over the pnum=1, pnum=2 .... values that mark the beginning of that particular paragraph. How do I iterate over this using regex? Will regex lookarounds help in this?
First Paragraph:
<p pnum=1>
<s snum=1>
<wf cmd=done pos=NN lemma=committee wnsn=1 lexsn=1:14:00::>Committee</wf>
<wf cmd=done pos=NN lemma=approval wnsn=1 lexsn=1:04:02::>approval</wf>
<wf cmd=ignore pos=IN>of</wf>
<wf cmd=done rdf=person pos=NNP lemma=person wnsn=1 lexsn=1:03:00:: pn=person>Gov._Price_Daniel</wf>
<wf cmd=done pos=NN lemma=banker wnsn=1 lexsn=1:18:00::>bankers</wf>
<punc>.</punc>
</s>
</p>
Second paragraph:
<p pnum=2>
<s snum=2>
<wf cmd=done rdf=person pos=NNP lemma=person wnsn=1 lexsn=1:03:00:: pn=person>Daniel</wf>
<wf cmd=done pos=RB lemma=personally wnsn=1 lexsn=4:02:01::>personally</wf>
<wf cmd=done pos=VB lemma=lead wnsn=7 lexsn=2:41:00::>led</wf>
<punc>.</punc>
</s>
</p>
....
` from a larger text? – Mohammad Yusuf Feb 03 '17 at 06:40bit so that whatever I extract from this portion shows up in the same line and the next paragraph's info on the next..so on and so forth.
– serendipity Feb 03 '17 at 06:43