I'm trying to import texts from xml files with readtext package in order to then create and explore a corpus with quanteda. Reading the help page I've figured out how to import the texts, but I'd like to know if one can create docvars based on nodes attributes from the xml files.
Let's imagine a XML file :
<corpus>
<text author="Bill" date="1928-05-27">
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed non risus. Suspendisse lectus tortor, dignissim sit amet, adipiscing nec, ultricies sed, dolor. Cras elementum ultrices diam. Maecenas ligula massa, varius a, semper congue, euismod non, mi. Proin porttitor, orci nec nonummy molestie, enim est eleifend mi, non fermentum diam nisl sit amet erat. Duis semper. Duis arcu massa, scelerisque vitae, consequat in, pretium a, enim. Pellentesque congue. Ut in risus volutpat libero pharetra tempor. Cras vestibulum bibendum augue.
</text>
</corpus>
You can import the text node's content as text field using a xpath expression :
library(readtext)
texts <- readtext("file.xml", text_field = ".//text", encoding = "utf-8", verbosity = 3)
But I don'k know if one can get node attributes as docvars (author and date in the present case) ?
If so, help to achieve that would be really nice !