0

I have a similar question to this: How to transform XML data into a data.frame?

I have an XML, that I want to convert to a data frame. But when I try this on my data, it doesn't work because i have different number of elements in my list

 df3 = plyr::ldply(xmlToList(books), data.frame)

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 9, 10

Could anyone tell me how to convert XML to data frame when there are different number of elements in my list?

Thanks,

Community
  • 1
  • 1
kay
  • 1,851
  • 3
  • 13
  • 14
  • 1
    Can you give a sample dataset with the type of data you are using? It is needed to make your question into a [reproducible question.](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – polka Aug 30 '16 at 17:05
  • I am not able to attach any files to the query here. So I loaded the file to my google drive: https://drive.google.com/open?id=0B3-883ME4sP3c01YUlIzV2M2SU0 – kay Aug 30 '16 at 19:40

1 Answers1

0

If you look closely at the XML file, there are 105 nodes under patient. If you pick one like "drugs", you still get 22 subnodes, some tags with text and attributes, some with only attributes and some with more subnodes. ldply can do lots of things, but not combine this mess.

doc <- xmlParse( file )
x <- xmlToList( doc)
names(x)
[1] "admin"   "patient" ".attrs" 
names(x$patient)
  [1] "additional_studies"                                                                                              
  [2] "tumor_tissue_site"                                                                                               
  [3] "tumor_tissue_site_other"                                                                                         
  [4] "prior_dx"                                                                                                        
  [5] "gender"                                                                                                          
  [6] "vital_status"                                                                                                    
  [7] "days_to_birth"                                                                                                                   
...
  [103] "drugs"                                                                                                           
  [104] "radiations"                                                                                                      
  [105] "clinical_cqcf"  

sapply(x$patient$drugs$drug, names) 
## text and attributes (usually 9)
$tx_on_clinical_trial
[1] "text"   ".attrs"

# attributes only
$regimen_number
[1] "preferred_name"     "display_order"      "cde"                "cde_ver"           
[5] "xsd_ver"            "tier"               "owner"              "procurement_status"
[9] "restricted"        

## 2 sub nodes 
$therapy_types
[1] "therapy_type"       "therapy_type_notes"
...
Chris S.
  • 2,185
  • 1
  • 14
  • 14