I have a folder of .xml files which look like this:
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Owner="NLM" Status="MEDLINE">
<PMID Version="1">23458631</PMID>
<DateCreated>
<Year>2013</Year>
<Month>04</Month>
<Day>08</Day>
</DateCreated>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Animals</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Calcium</DescriptorName>
<QualifierName MajorTopicYN="Y">metabolism</QualifierName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Calcium Chloride</DescriptorName>
<QualifierName MajorTopicYN="N">administration & dosage</QualifierName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
</PubmedArticle>
<PubmedArticle>
<MedlineCitation Status="Publisher" Owner="NLM">
<PMID Version="1">23458629</PMID>
<DateCreated>
<Year>2013</Year>
<Month>3</Month>
<Day>20</Day>
</DateCreated>
<MeshHeadingList>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Adolescent</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Adult</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName MajorTopicYN="N">Anthropometry</DescriptorName>
</MeshHeading>
</MeshHeadingList>
</MedlineCitation>
</PubmedArticle>
</PubmedArticleSet>
I would like to use Python to parse the XML files and extract PMID,DateCreated,all DescriptorName and MajorTopicYN for each article. Then, save the result as .txt file that looks like:
ArticleID|CreatedDate|MeSH|IsMajor
23458631|20130408|Animals|N
23458631|20130408|Calcium|N
23458631|20130408|Calcium Chloride|N
23458629|20130320|Adolescent|N
23458629|20130320|Adult|N
23458629|20130320|Anthropometry|N