0

I have a large, un-organized XML file that I need to search to determine if a certain ID numbers are in the file. I would like to use R to do so and because of the format, I am having trouble converting it to a data frame or even a list to extract to a csv. I figured I can search easily if it is in a csv format. So , I need help understanding how to do convert it and extract it properly, or how to search the document for values using R. Below is the code I have used to try and covert the doc,but several errors occur with my various attempts.

## Method 1. I tried to convert to a data frame, but the each column is not the same length.

    require(XML)
    require(plyr)

    file<-"EJ.XML"

    doc <- xmlParse(file,useInternalNodes = TRUE)
    xL <- xmlToList(doc)

    data <- ldply(xL, data.frame)
    datanew <- read.table(data, header = FALSE, fill = TRUE)


## Method 2. I tried to convert it to a list and the file extracts but only lists 2 words on the file. 


    data<- xmlParse("EJ.XML")
    print(data)
    head(data)
    xml_data<- xmlToList(data)

    class(data)
    topxml <- xmlRoot(data)
    topxml <- xmlSApply(topxml,function(x) xmlSApply(x, xmlValue))
    xml_df <- data.frame(t(topxml),
                         row.names=NULL)


    write.csv(xml_df, file = "MyData.csv",row.names=FALSE)

I am going to do some research on how to search within R as well, but I assume the file needs to be in a data frame or list to so either way. Any help is appreciated! Attached is a screen shot of the data. I am interested in finding matching entity id numbers to a list I have in a excel doc.

ANN
  • 73
  • 2
  • 11
  • Without knowing how the XML files looks like it is hard to give good advise – Jaap Jul 11 '17 at 11:47
  • Yes! I apologize, I should have included that. Let me attach a screenshot. I am interested in finding matching entity ID numbers to a list I have in a CSV document. – ANN Jul 11 '17 at 11:58
  • Screen shot is there :) – ANN Jul 11 '17 at 12:03
  • Don't post your data as an image, please inlcude a [reproducible exmaple](https://stackoverflow.com/q/5963269/2204410) – Jaap Jul 11 '17 at 12:17
  • Thank you very much for your help Jaap, but I am unable to copy the code or print it in R. I tried to copy a few lines and maybe due to the file size , its not taking. – ANN Jul 11 '17 at 12:48
  • XML almost by definition is organized. There are tags with attributes and values in your example. What tag are you looking to search on, organization, entity? If you are just looking for a particular string, why not use `readLines()` and `grepl()`? – hrbrmstr Jul 11 '17 at 12:53
  • I am trying to search for a matches to the entity ID. Okay yeah that makes sense, I mis spoke with stating it was unorganized. I am so unfamiliar with XML and new to R as well. I just looked up both of those commands, can you give me an example on how to use them? Thank you Jaap! – ANN Jul 11 '17 at 12:58

0 Answers0