I need to parse 2000 XML Files. I managed setting that I can automatically get my data from the files. Since I am a complete beginner, it maybe looks messy, here an example:
filenames <- list.files("C:/...", recursive=TRUE, full.names=TRUE, pattern=".xml")
name <- unlist(lapply(filenames, function(f) {
xml <- xmlParse(f)
xpathSApply(xml, "//...", xmlValue)
}))
data <- data.frame(name)
This works for most of my needed data but my current problem is that some files miss a certain data so I can't include them because of different number of rows. An example of what the files look like is: File 1:
<Kontaktdaten>
<Name> Name </Name>
<ID>12345678</ID>
<Kontakt_Zugang>
<Strasse>ABC-Strasse</Strasse>
<Hausnummer>1</Hausnummer>
<Postleitzahl>12345</Postleitzahl>
<Ort>ABC</Ort>
</Kontakt_Zugang>
</Kontaktdaten>
File 2 (where "Hausnummer" is missing for example):
<Kontaktdaten>
<Name> Name2 </Name>
<ID>8765321</ID>
<Kontakt_Zugang>
<Strasse>CBA-Strasse</Strasse>
<Postleitzahl>54321</Postleitzahl>
<Ort>CBA</Ort>
</Kontakt_Zugang>
</Kontaktdaten>
Is there any way how I can combine them anyway in one data.frame or create a second data.frame only with the "Hausnummer" and the ID?
EDIT: This is only an example to show my problem. The original files are up to 500 nodes long, some of them are doubled.