I am using the x-ray module for the first time.
I have no problem for using it but I have some issue when I try to scrape data in a bad formatting HTML code.
For example, if I try to scrape this HTML code from a website:
<div class="item">
<dl class="list">
<dd id="1"> Data1
<dd id="2"> Data2
<dd id="3"> Data3
</dl>
using this code:
x(html, '.item', [{
tags: x('.item', 'dd:nth-child(1)')
}])
(function(err, obj) {
var jsonCleaned = JSON.parse(JSON.stringify(obj).replace(/"\s+|\s+"/g,'"').replace(/\\n/g, ''))
res.json(jsonCleaned);
})
I get the following result:
[
{
"tags": "Data1 Data2 Data3"
}
]
My scraping code works if the DD tags are closed.
[
{
"tags": "Data1"
}
]
Any solution on how to resolve this problem ?