I want to extract amazon reviews and all its related data like: Name of reviewer, rating, content, and (if possible) comments in response to that review. I am using python 3.7.
Asked
Active
Viewed 1,962 times
-1
-
You can use selenium like tool or head less chrome – Rohit Dhiman Sep 29 '18 at 07:39
-
Thanks Rohit.. It helped me a lot. Now the issue remaining is to crawl star ratings against each review. Rather all information are extracted by your kind suggestion – Yawar Abbas Sep 29 '18 at 12:24
-
you can simply ask for API from amazon that will be faster and reliable. – Rohit Dhiman Sep 29 '18 at 12:26
-
or if you don't have API then by using above method you can look for particular element in dom – Rohit Dhiman Sep 29 '18 at 12:27
-
I will try it.. as I have not a deep knowledge about API – Yawar Abbas Sep 29 '18 at 12:29
-
I have added an answer, if you found it helpful please accept it – Rohit Dhiman Sep 29 '18 at 12:35
-
also check this out https://stackoverflow.com/questions/11080584/is-it-legal-to-crawl-amazon?rq=1 – Rohit Dhiman Sep 29 '18 at 13:07
1 Answers
0
You can achieve that by two methods:
- API (reliable fast)
Ask for API from Amazon
- Tools like Headless chrome or Selenium check this post
Find DOM element in the page like
<div id="R1IZDPP09RA69A" data-hook="review" class="a-section review">
<div id="customer_review-R1IZDPP09RA69A" class="a-section celwidget" data-cel-widget="customer_review-R1IZDPP09RA69A">
<div class="a-row a-spacing-mini">
<a href="/gp/profile/amzn1.account.AGFB356ZQJQAHWZZABTTNIFGYMDA/ref=cm_cr_dp_d_gw_tr?ie=UTF8" class="a-profile" data-a-size="small">
<div aria-hidden="true" class="a-profile-avatar-wrapper">
<div class="a-profile-avatar">
<img src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg" class="" data-src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg">
<noscript><img src="https://images-eu.ssl-images-amazon.com/images/S/amazon-avatars/f0d86e6d-45d4-4cc7-a4b8-a062450c2c75._CR0,0,335,335_SX48_.jpg"></noscript>
</div>
</div>
<div class="a-profile-content"><span class="a-profile-name">Sana Khanam</span></div>
</a>
</div>
<div class="a-row"><a class="a-link-normal" title="4.0 out of 5 stars" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=B079Q9VNWQ"><i data-hook="review-star-rating" class="a-icon a-icon-star a-star-4 review-rating"><span class="a-icon-alt">4.0 out of 5 stars</span></i></a><span class="a-letter-space"></span><a data-hook="review-title" class="a-size-base a-link-normal review-title a-color-base a-text-bold" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=B079Q9VNWQ">Amazing service by VOLTAS !!Appreciated </a></div>
<span data-hook="review-date" class="a-size-base a-color-secondary review-date">17 May 2018</span>
<div class="a-row a-spacing-mini review-data review-format-strip"><span data-hook="avp-badge-linkless" class="a-size-mini a-color-state a-text-bold">Verified Purchase</span></div>
<div class="a-row a-spacing-small review-data">
<span data-hook="review-body" class="a-size-base review-text">
<div aria-live="polite" data-a-expander-name="review_text_read_more" data-a-expander-collapsed-height="300" class="a-expander-collapsed-height a-row a-expander-container a-expander-partial-collapse-container" style="max-height: none; height: 300px;">
<div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content a-expander-partial-collapse-content" style="padding-bottom: 19px;">5 th day after having AC installed.<br>PROS:<br><br>-Cooling nice<br>-Looks good<br>-Good one in this price range<br>-Fast Installation within 24h<br>- Customer support response appriciated<br><br>CONS<br>-Started making weird little noises while decreasing temperature.<br><br>-I don't understand but some unpleasant smell being diffused after starting it at the 5th day of installation.<br><br>-Contacted seller, issue resolved!<br><br>Overall would RECOMMEND!! Go for it.</div>
<div class="a-expander-header a-expander-partial-collapse-header" style="opacity: 1; display: block;">
<div class="a-expander-content-fade"></div>
<a href="javascript:void(0)" data-hook="expand-collapse-read-more-less" data-action="a-expander-toggle" class="a-declarative" data-a-expander-toggle="{"allowLinkDefault":true, "expand_prompt":"Read more", "collapse_prompt":"Read less"}"><i class="a-icon a-icon-extender-expand"></i><span class="a-expander-prompt">Read more</span></a>
</div>
</div>
</span>
</div>
<div data-hook="review-comments" class="a-row review-comments">
<span class="cr-vote" data-hook="review-voting-widget">
<div class="a-row a-spacing-small"><span data-hook="helpful-vote-statement" class="a-size-base a-color-tertiary cr-vote-text">5 people found this helpful</span></div>
<div class="cr-helpful-button aok-float-left">
<span class="a-button a-button-base" id="a-autoid-14">
<span class="a-button-inner">
<a href="https://www.amazon.in/ap/signin?openid.return_to=https%3A%2F%2Fwww.amazon.in%2Fdp%2FB079Q9VNWQ%2Fref%3Dcm_cr_dp_d_vote_lft%3Fie%3DUTF8%26voteInstanceId%3DR1IZDPP09RA69A%26voteValue%3D1%26csrfT%3DgiYnW3e%252Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%252F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%252F%252F%252F%252F%23R1IZDPP09RA69A&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.assoc_handle=inflex&openid.mode=checkid_setup&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0" data-hook="vote-helpful-button" class="a-button-text" role="button" id="a-autoid-14-announce">
<div class="cr-helpful-text">
Helpful
</div>
</a>
</span>
</span>
</div>
</span>
<i class="a-icon a-icon-text-separator" role="img" aria-label="|"></i><a data-hook="review-comment" class="a-size-base a-link-normal a-color-secondary a-text-normal" href="/gp/customer-reviews/R1IZDPP09RA69A/ref=cm_cr_dp_d_rvw_btm?ie=UTF8&ASIN=B079Q9VNWQ#wasThisHelpful">Comment</a><span class="cr-footer-line-height">
<span><i class="a-icon a-icon-text-separator" role="img" aria-label="|"></i><span class="a-declarative" data-action="cr-popup" data-cr-popup="{"width":"580","title":"ReportAbuse","url":"/hz/reviews-render/report-abuse?ie=UTF8&voteDomain=Reviews&ref=cm_cr_dp_d_rvw_hlp&csrfT=giYnW3e%2Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%2F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%2F%2F%2F%2F&entityId=R1IZDPP09RA69A&sessionId=257-1905223-9805712","height":"380"}"><a class="a-size-base a-link-normal a-color-secondary report-abuse-link a-text-normal" href="/hz/reviews-render/report-abuse?ie=UTF8&voteDomain=Reviews&ref=cm_cr_dp_d_rvw_hlp&csrfT=giYnW3e%2Fv7p8y07m5Je2hA3LGXQ2gVKHQWzD%2F40AAAAJAAAAAFuvcotyYXcAAAAAFVfwLBerPie4v1Ep%2F%2F%2F%2F&entityId=R1IZDPP09RA69A&sessionId=257-1905223-9805712">Report abuse</a></span></span></span>
</div>
</div>
</div>
here you can find reviewer name rating and review under each div
having attribute data-hook="review"
.
Name under <span class="a-profile-name">Sana Khanam</span>
Rating <span class="a-icon-alt">4.0 out of 5 stars</span>
Review <div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content a-expander-partial-collapse-content" style="padding-bottom: 19px;">5 th day after...</div>

Rohit Dhiman
- 2,691
- 18
- 33
-
1. review_title = driver.find_elements_by_xpath('//a[@data-hook="review-title"]') 2. review_content = driver.find_elements_by_xpath('//span[@data-hook="review-body"]') 3. review_date = driver.find_elements_by_xpath('//span[@data-hook="review-date"]') 4. helpful_vote = driver.find_elements_by_xpath('//span[@data-hook="helpful-vote-statement"]') 5. rating = driver.find_elements_by_xpath('//span[@class="a-icon-alt"]') No.5 is not working.. above 4 are working – Yawar Abbas Sep 29 '18 at 12:56
-
-
@data-hook="review-star-rating" and then extract rating from that – Rohit Dhiman Sep 29 '18 at 13:03
-
@data-hook="review-star-rating" and then extract rating from that still this is not giving the desired results – Yawar Abbas Sep 29 '18 at 18:13
-
rating = driver.find_element_by_xpath('//i[@data-hook="review-star-rating"]') Now this is working but as I am getting values in text so these stars are not going to be converted in text. is there any way in python to get these stars or we can convert these to text?? – Yawar Abbas Sep 30 '18 at 12:19
-
Save them into numerical value, after that you can show them as graphics or text in html or any frontend – Rohit Dhiman Sep 30 '18 at 12:28
-