2

part HTML code of page

<div name="price" class="detail-price-test">
  <meta itemprop="price" content="3303">
  <meta itemprop="priceCurrency" content="test">
  <span id="price_label">3 303</span><span class="detail-price-test-sign" id="price_label_sign"> eur</span>
  <script>
    if (price_json.price != '0') {
    var price_container = document.getElementById('price_container'),
    price_cheaper_selector = 'detail-price-cheaper';
    document.getElementById('price_label').innerHTML = price_json.price_formatted;
    document.getElementById('price_label_sign').innerHTML = "&thinsp;eur";
    if (parseFloat(price_json.old_price) >
    parseFloat(price_json.price) &&
    price_container &&
    !price_container.hasClass(price_cheaper_selector)
    ) {
    price_container.addClass(price_cheaper_selector);
    }
    }
  </script>
  <link itemprop="availability" href="http://schema.org/InStock">
</div>

1) First question: How i can extract attr content with value 3303 from meta itemprop="price" ? OR with osmosis, it is impossible to make?

2) Second question: Why i cant get value 3 303 in this <span id="price_label">3 303</span>

osmosis
.get('myURL.com')
.find('div.detail-price-test span#price_label') //or div.detail-price-test span[id=price_label]
.set('test')
.data(console.log);

Result in cosole: test: ''

Maybe the problem in JavaScript script and osmosis can't work with this?

AmerllicA
  • 29,059
  • 15
  • 130
  • 154
John Laybues
  • 83
  • 1
  • 6

2 Answers2

0

In Cheerio it's:

$('[itemprop="price"]').attr('content')

In osmosis? No idea, I've never heard of it.

pguardiario
  • 53,827
  • 19
  • 119
  • 159
  • thx, but in cheerio i already know about attr(), i need learn about this posibility in osmosis – John Laybues Dec 10 '18 at 10:25
  • If I were you I would remove osmosis a a dependency for your project. This is some sort of hobby library. – pguardiario Dec 10 '18 at 11:37
  • Don't underestimate Osmosis. Yes it has pros and cons but definitely very useful for crawling. No library out there can crawl faster than what osmosis can do. It has its purpose. I used request-cheerio back then. I tried selenium, puppeteer, nightmare, phantomjs as well. But I always go back to osmosis, whenever I need something scraped quick. – Sachi Dec 23 '18 at 03:15
0

Your selector is incorrect.

First question answer: The selector should be: 'meta[itemprop="price"]@content' You can do something like this:

osmosis
.get('myURL.com')
.find('meta[itemprop="price"]@content')
.set('price')
.data(console.log) // {price : 3303}

Second question answer: The correct selector is either

  1. 'div.detail-price-test > span#price_label'
  2. 'span#price_label'
  3. '#price_label'

do something like:

osmosis
.get('myURL.com')
.find('div.detail-price-test > span#price_label')
.set('test')
.data(console.log); // {test : 3...}
Sachi
  • 1,286
  • 1
  • 10
  • 17