0

I'm using cheerio library as a scraper in my nodejs project. I want to parse the following structure:

<li class="sub menu-category-main">
  <p>
    <span class="price">$16.00</span>
    ZESTAW DNIA + ZUPA        
  </p>
</li>
<li class=" ">
  <p>
    <span class="price">$12.00</span>
    <img class="allergens" title="Vegerarian" src="/new_site/img/vegetarian_.png">
    NALEŚNIKI AMERYKAŃSKIE Z SOSEM OWOCOWYM
    <br>
    american pancakes with fruit sauce
  </p>
</li>
<li class=" ">
  <p>
    <span class="price">$11.00</span>
    <img class="allergens" title="lactose free" src="/new_site/img/lactose_.png">
    <img class="allergens" title="gluten free" src="/new_site/img/gluten_.png">
    <img class="allergens" title="Vegerarian" src="/new_site/img/vegetarian_.png">
    LECZO WEGETARIAŃSKIE
    <br>
    vegetables lecho
  </p>
</li>

How can I parse this HTML so I can have price, name and list of images? At the end I want to build a JSON object to reuse the data (I know how to build a JSON, just have problems with parsing above HTML).

You can notice that there are names in English and Polish. I'm interested in the strings in Polish. Also please note that the structure of this document is very irregular (not consistent).

I also want to add, that making .text() of "p" does not give me the results that I like.

mysliwiec_tech
  • 649
  • 7
  • 21
  • 1
    Maybe this could help you? https://stackoverflow.com/questions/20832910/get-text-in-parent-without-children-using-cheerio – DevNico Sep 13 '18 at 15:27
  • @Nicolas You are amazing! It works just perfect :-) Thank you! – mysliwiec_tech Sep 13 '18 at 20:20
  • 1
    Possible duplicate of [Get text in parent without children using cheerio](https://stackoverflow.com/questions/20832910/get-text-in-parent-without-children-using-cheerio) – mihai Nov 03 '18 at 12:00

0 Answers0