2

I am trying to subset thead/tbody without directly calling rowlist$td$list$item$table$thead or rowlist[[td]][[list]][[item]][[table]][[thead]]. This unlist(rowlist, use.names=FALSE )[ grepl( "tbody", names(unlist(rowlist)))] serves my purpose except I need it as multiple rows (e.g. two tr's in tbody)(i can split it but seems counter intuitive . I know there should be a better way to work with HTML/XML but this is got I got for now.

str(rowlist)
List of 1
 $ td:List of 1
  ..$ list:List of 1
  .. ..$ item:List of 1
  .. .. ..$ table:List of 2
  .. .. .. ..$ thead:List of 1
  .. .. .. .. ..$ tr:List of 7
  .. .. .. .. .. ..$ th:List of 1
  .. .. .. .. .. .. ..$ : chr "Test"
  .. .. .. .. .. ..$ th:List of 1
  .. .. .. .. .. .. ..$ : chr "Outcome"
  .. .. .. .. .. ..$ th:List of 1
  .. .. .. .. .. .. ..$ : chr "Subset"
  .. .. .. .. .. ..$ th:List of 1
  .. .. .. .. .. .. ..$ : chr "Cups"
  .. .. .. .. .. ..$ th:List of 1
  .. .. .. .. .. .. ..$ : chr "Bowls"
  .. .. .. .. .. ..$ th:List of 1
  .. .. .. .. .. .. ..$ : chr "Plates"
  .. .. .. .. .. ..$ th:List of 1
  .. .. .. .. .. .. ..$ : chr "Jars"
  .. .. .. ..$ tbody:List of 2
  .. .. .. .. ..$ tr:List of 7
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "test1"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "High"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "Low"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "Gold"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "Blue"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "Green"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "red"
  .. .. .. .. .. ..- attr(*, "ID")= chr "id_511"
  .. .. .. .. ..$ tr:List of 7
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "test2"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "Low"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "High"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "Pink"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "Blue"
  .. .. .. .. .. ..$ td:List of 1
  .. .. .. .. .. .. ..$ : chr "Purple"
  .. .. .. .. .. ..$ td: list()
  .. .. .. .. .. ..- attr(*, "ID")= chr "id_512"
  .. ..- attr(*, "styleCode")= chr "none"

List looks like this

rowlist<-list(td = structure(list(list = structure(list(item = list(table = list(
  thead = list(tr = list(
    th = list("Test"), th = list("Outcome"), th = list("Set"), th = list("Cups"), th = list("Bowls"), th = list( "Plates"), th = list("Jars"))), 
  tbody = list(tr = structure(
    list(td = list("test1"), td = list("High"), td = list("Low"), td = list("Gold"), td = list("Blue"), td = list("Green"), td = list("Red")), ID = "id_511"), 
    tr = structure(
      list(td = list("test2"), td = list("Low"), td = list("High"), td = list("Pink"), td = list("Blue"), td = list("Purple"), td = list()), ID = "id_512"))))), styleCode = "none")), colspan = "20"))
Bristle
  • 77
  • 6
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Use ` dput()` rather than `str()` so we can copy/paste the data into R. Also, what was this data before a list? Was it an html document? Using an html parser would probably make it easier to navigate and subset the data rather than working with nested lists. – MrFlick Aug 04 '20 at 16:42
  • It's a HTML table within a XML doc, xml2 as_list has allowed me easier access to some data through loops e.g. instead of test/some/other/thing I can do test[[3]][[2]][[4]] – Bristle Aug 04 '20 at 18:23
  • When an `xml2` you can easily select nodes with xpath expressions. It's a bit messier with lists of lists. Can you share the HTML/XML data instead? – MrFlick Aug 04 '20 at 18:25
  • the node selection has become an issue since some nodes have the same name and both similar different content which gets badly mixed up, it is not a regular XML doc skirting on the edge or xml rules, the list does make mapping it much more simple for me. The xml is to huge to obfuscate to share. – Bristle Aug 04 '20 at 19:01

1 Answers1

1

If the object has to be handled as a nested list, one approach is to use rrapply in the rrapply-package (extension of base rapply):

library(rrapply)  ## v1.2.1

out <- rrapply(rowlist, 
        classes = "list",
        condition = function(x, .xname) .xname %in% c("thead", "tbody"), 
        how = "flatten")

str(out, list.len = 2)
#> List of 2
#>  $ thead:List of 1
#>   ..$ tr:List of 7
#>   .. ..$ th:List of 1
#>   .. .. ..$ : chr "Test"
#>   .. ..$ th:List of 1
#>   .. .. ..$ : chr "Outcome"
#>   .. .. [list output truncated]
#>  $ tbody:List of 2
#>   ..$ tr:List of 7
#>   .. ..$ td:List of 1
#>   .. .. ..$ : chr "test1"
#>   .. ..$ td:List of 1
#>   .. .. ..$ : chr "High"
#>   .. .. [list output truncated]
#>   .. ..- attr(*, "ID")= chr "id_511"
#>   ..$ tr:List of 7
#>   .. ..$ td:List of 1
#>   .. .. ..$ : chr "test2"
#>   .. ..$ td:List of 1
#>   .. .. ..$ : chr "Low"
#>   .. .. [list output truncated]
#>   .. ..- attr(*, "ID")= chr "id_512"

Here, the condition function returns only nodes with names thead or tbody, how = "flatten" returns the nodes in a flat list (how = "prune" would prune the nodes keeping the original list structure), and classes = "list" does not skip intermediate list nodes (as would be the case with base rapply()).

Joris C.
  • 5,721
  • 3
  • 12
  • 27