R - readtext and list of .xml files

Question

I'm trying to create a corpus and a vcorpus with a bulk of .xml files, for quantitative linguistics With txt files I usually write

library(tm)
library(stopwords)
library(magrittr) 
library(dplyr) 
library(readtext)
library(quanteda)
library(quanteda.textmodels)
library(quanteda.textplots)
library(quanteda.textstats)

object <- readtext ("directory")
and
objectV <- DirSource ("directory") %>%
VCorpus(readerControl = list(language = "it-IT"))

Trying with a directory containing xml files I get this error

The xml format does not fit for the extraction without xPath
Use xPath method instead

Any suggestion? Thank you! E.

How are the xml files structured? Are you trying to make the text from all nodes in the document? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please include the necessary `library()` commands in your sample code so it can be easily copy/pasted for testing. — MrFlick, Dec 12 '22 at 16:44
This is an example: https://drive.google.com/file/d/1ToSd47jRPknIRzQOyZIP9blOha-0aLgd/view?usp=share_link; I'd like to create a Corpus and a DTM out of a hundred of documents like this — SubotnikOne, Dec 12 '22 at 19:22

R - readtext and list of .xml files

0 Answers0