I have a html and a R code like these and need to relate each node value to its parent id in a data.frame. There are some different information available for each person.
example <- "<div class='person' id='1'>
<div class='phone'>555-5555</div>
<div class='email'>jhon@123.com</div>
</div>
<div class='person' id='2'>
<div class='phone'>123-4567</div>
<div class='email'>maria@gmail.com</div>
</div>
<div class='person' id='3'>
<div class='phone'>987-6543</div>
<div class='age'>32</div>
<div class='city'>New York</div>
</div>"
doc = htmlTreeParse(example, useInternalNodes = T)
values <- xpathSApply(doc, "//*[@class='person']/div", xmlValue)
variables <- xpathSApply(doc, "//*[@class='person']/div", xmlGetAttr, 'class')
id <- xpathSApply(doc, "//*[@class='person']", xmlGetAttr, 'id')
# The problem: create a data.frame(id,variables,values)
With xpathSApply()
, I can get phone, email, and age values as well as person attributes (id) too. However, those information come isolated and I need to reference them to the right data.frame variable and the right person. In my real data there are a lot of different information, so this process of naming each variable has to be automatic.
My goal is to create a data.frame like this relating each id to its proper data.
id variables values
1 1 phone 555-5555
2 1 email jhon@123.com
3 2 phone 123-4567
4 2 email maria@gmail.com
5 3 phone 987-6543
6 3 age 32
7 3 city New York
I believe I would have to create a function to use inside xpathSApply
which would get at the same time the person phone and the person id, so they would be related, but I haven't had any success with that so far.
Can anyone help me?