I was looking on one page with review. I tried to scrape review from page (though site provide API for the same).
I saw each review is embedded inside li
tag. In li
tag there are many other tags.
Inside, there is one div with class name review-wrapper
which contains review with rate and review.
Is it possible to write script which consider all such container and scrape review, image (if exist), rate and date?
Is regex
correct way to do this or is DOM
suitable?
http://www.yelp.com/biz/franchino-san-francisco?start=80 - Page link
Here is the code snipper:
<div class="review-wrapper">
<div class="review-content">
<div class="biz-rating biz-rating-very-large clearfix">
<div itemprop="reviewRating" itemscope itemtype="http://schema.org/Rating">
<div class="rating-very-large">
<i class="star-img stars_5" title="5.0 star rating">
<img alt="5.0 star rating" class="offscreen" height="303" src="http://s3-media3.ak.yelpcdn.com/assets/2/www/img/c2252a4cd43e/ico/stars/v2/stars_map.png" width="84">
</i>
<meta itemprop="ratingValue" content="5.0">
</div>
</div>
<span class="rating-qualifier">
<meta itemprop="datePublished" content="2013-10-28">
10/28/2013
</span>
</div>
<p class="review_comment ieSucks" itemprop="description" lang="en">The reason I started a yelp account, was to write a review for Franchinos. This is my favorite restaurant in the city of San Francisco, and especially, North Beach. <br><br>Where do I start... I take every friend, family member and acquaintance to Franchinos in every opportunity I can. I am a Italy-nut and have been over three times - the mood + atmosphere is almost identical. It is a 100% family-run restaurant and you can taste the expertise and 'home-cooking'. <br><br>Each time, I get a large bottle of wine (One time - they ran out of the wine I had ordered - and instead gave me a larger, more expensive bottle - same price), a wonderful pasta dish (Alfredo, carbonara.. etc.) and a Caesar salad.<br><br>Need I say more? Buenisimo. I look forward to the next time.. and the times after that again and again. <br><br>è perfetto!</p>
</div>
<div class="review-footer clearfix">
<div class="rateReview ufc-feedback clearfix" data-review-id="SnZ4Q97nJdR7a-fot-Slcw">
<p class="review-intro review-message">
Was this review …?
</p>