stuck on this for hours.
I am simplifying a >15000 line xml file, containing data on lung function tests. Each xml file contains multiple tests. Using xml2 and map I can get the data into a list of length n-of-tests.
Here is an extract of the list for two tests inside a file:
[[1]]
[[1]][[1]]
Name UM Value
"MEF75%" "L/s" "6.82"
[[1]][[2]]
Name UM Value Predicted PercPred ZScore LLN ULN
"FEV1" "L" "3.83" "4.16" "92" "-0.62" "3.27" "5.01"
...
[[2]]
[[2]][[1]]
Name UM Value
"MEF75%" "L/s" "6.65"
[[2]][[2]]
Name UM Value Predicted PercPred ZScore LLN ULN
"FEV1" "L" "3.79" "4.16" "91" "-0.69" "3.27" "5.01"
....
I can convert this into a tibble easily with map_dfr or bind_rows but what i cant seem to figure out is how to add the list index [[1]] or [[2]] as a column in the tibble. If i use the .id argument, it simply numbers the rows sequentially, doesnt refer to the list:
map(trials, ~xml_find_all(., "AdditionalData/Parameters/Parameter")) %>%
map (., ~xml_attrs(.)) %>% bind_rows(. , .id = "test")
A tibble: 104 x 9
test Name UM Value Predicted PercPred ZScore LLN ULN
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 MEF75% L/s 6.82 NA NA NA NA NA
2 2 FEV1 L 3.83 4.16 92 -0.62 3.27 5.01
...
53 53 MEF75% L/s 6.65 NA NA NA NA NA
54 54 FEV1 L 3.79 4.16 91 -0.69 3.27 5.01
What I am trying to get to is (difference in first column - "test"):
map(trials, ~xml_find_all(., "AdditionalData/Parameters/Parameter")) %>%
map (., ~xml_attrs(.)) %>% bind_rows(. , .id = "test")
A tibble: 104 x 9
test Name UM Value Predicted PercPred ZScore LLN ULN
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 MEF75% L/s 6.82 NA NA NA NA NA
2 1 FEV1 L 3.83 4.16 92 -0.62 3.27 5.01
...
53 2 MEF75% L/s 6.65 NA NA NA NA NA
54 2 FEV1 L 3.79 4.16 91 -0.69 3.27 5.01
Is this do-able with tidyverse? Should I try to work it out with a base-R loop?
Any help appreciated, thanks. -BF