Sorry, edited with one more little nuance! I had simplified my raw file a little too much in the example I provided, so while your solution works beautifully as-is, what if there are a few extra things thrown into the second line? Those seem to throw off the xml_find_all(page, "//event"), since now it can't find that node. How can I get the script to ignore the extras (or maybe what is the right search term to incorporate them?) Thanks!!!
I'm new to working with xml, and I have some speech xml files that I'm trying to flatten into dataframes in R, but I can't get them to be read using some of the standard functions in the XML package. I think the problem is the plist format, because some of the other answers that I've tried to apply don't work on these files.
My files look as follows (*****second line edited):
<?xml version="1.0" encoding="us-ascii"?>
<event id="111" extraInfo="CivilwarSpeeches" xmlns = "someurl>
<meta>
<title>Gettysburg</title>
<date>1863-11-19</date>
<organizations>
<org>Union</org>
</organizations>
<people>
<person id="0" type="President">Honest Abe</person>
</people>
</meta>
<body>
<section name="Address">
<speaker id="0">
<plist>
<p>Four score and seven years ago</p>
</plist>
</speaker>
</section>
</body>
</event>
And I would like to end up with a dataframe that links some of the info in the two sections, something like
Section|Speaker|Speaker Type| Speaker Name|Body
Address|0 |President | Honest Abe |Four score and seven years ago
I found this answer fairly helpful, but it still can't seem to unpack my data. Parsing XML file with known structure and repeating elements
Any help would be appreciated!