2

I have an xml-tei file:

#in R
doc  <- xmlTreeParse("FILE_NAME" , useInternalNodes=TRUE, encoding="UTF-8")
ns  =  c(ns =  "http://www.tei-c.org/ns/1.0")
namespaces = ns
getNodeSet(doc,"//* and //@*", ns) 
doc

I am looking at two elements inside my xml-tei: <l> and <w>, and attributes (1) for <l>, @xml:id and (2) for <w> type="verb" and ana="#confrontation #action #ANT":

#example of element <l> and its child <w> in XML-TEI FILE    
<l n="5b-6a" xml:id="ktu1.3_ii_5b-6a">
 <w>[...]</w>
 <w type="verb" ana="#MḪṢ01 #confrontation #action #ANT" xml:id="ktu1-3_ii_l5b-6a_tmtḫṣ" lemmaRef="uga/verb.xml#mḫṣ">tmtḫṣ</w>
 <g>.</g>
</l>

I use the function getNodeSet

#in R
l_cont <- getNodeSet(doc, "//ns:l[(@xml:id)]", ns) 
l_cont

Of course it shows all elements and attributes inside <l>. But I would like to select only relevant attributes and their values, to have something like this :

#in R
xml:id="ktu1.3_ii_5b-6a"
type="verb" ana="#confrontation #action #ANT"

Following the suggestion of another post Load XML to Dataframe in R with parent node attributes, I did:

#in R
attrTest <- function(x) {
 attrTest01 <- xmlGetAttr(x, "xml:id")
 w <- xpathApply(x, 'w', function(w) {
  ana <- xmlGetAttr(w, "ana")
  if(is.null(w))
 data.frame(attrTest01, ana)
 })
do.call(rbind, w)
}
res <- xpathApply(doc, "//ns:l[(@xml:id)]", ns ,attrTest)
temp.df <- do.call(rbind, res)

But it doesn't work... I get the errors:

> res <- xpathApply(doc, "//ns:l[(@xml:id)]", ns ,attrTest)
Error in get(as.character(FUN), mode = "function", envir = envir) : 
objet 'http://www.tei-c.org/ns/1.0' de mode 'function' introuvable
> temp.df <- do.call(rbind, res)
Error in do.call(rbind, res) : objet 'res' introuvable

Do you have suggestions? In advance, thank you

Community
  • 1
  • 1
Vanessa
  • 121
  • 12
  • The xml content is already added above, 2nd section starting by `#example of element and its child in xml-tei file` . – Vanessa Mar 13 '17 at 18:28

1 Answers1

1

I would suggest using the R-package tei2r. (https://rdrr.io/github/michaelgavin/tei2r/) This package has helped me, when working with TEI encoded files.

From this package I would use the function importTexts to import the document and the parseTEI function to get the exact nodes you are looking for.

Another way to import and extract could be this:

read_tei <- function(folder) {
  list.files(folder, pattern = '\\.xml$', full.names = TRUE) %>%
    map_dfr(~.x %>% parseTEI(.,node = "INSERT_NODE_TO_FIND") %>%tibble())
}

text <- read_tei("/Path/to/file").
Victor Harbo
  • 17
  • 1
  • 7