0

I'm trying to create a corpus and a vcorpus with a bulk of .xml files, for quantitative linguistics With txt files I usually write

library(tm)
library(stopwords)
library(magrittr) 
library(dplyr) 
library(readtext)
library(quanteda)
library(quanteda.textmodels)
library(quanteda.textplots)
library(quanteda.textstats)

object <- readtext ("directory")
and
objectV <- DirSource ("directory") %>%
VCorpus(readerControl = list(language = "it-IT"))

Trying with a directory containing xml files I get this error

The xml format does not fit for the extraction without xPath
Use xPath method instead

Any suggestion? Thank you! E.

  • How are the xml files structured? Are you trying to make the text from all nodes in the document? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please include the necessary `library()` commands in your sample code so it can be easily copy/pasted for testing. – MrFlick Dec 12 '22 at 16:44
  • This is an example: https://drive.google.com/file/d/1ToSd47jRPknIRzQOyZIP9blOha-0aLgd/view?usp=share_link; I'd like to create a Corpus and a DTM out of a hundred of documents like this –  SubotnikOne Dec 12 '22 at 19:22

0 Answers0