2

I am trying to extract the image from above xpath from app store: https://apps.apple.com/us/app/mercer-marketplace-benefits/id1041417557

enter image description here

I tried the following code using the xpath:

driver.get('https://apps.apple.com/us/app/mercer-marketplace-benefits/id1041417557')
rating_distr = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.XPATH, """(//*[@id="ember290"]/div/div[2])""")))
print(rating_distr.get_attribute('innerHTML'))

But the output is not an image:

    <figure class="we-star-bar-graph">
    <div class="we-star-bar-graph__row">
      <span class="we-star-bar-graph__stars we-star-bar-graph__stars--5"></span>
      <div class="we-star-bar-graph__bar">
        <div class="we-star-bar-graph__bar__foreground-bar" style="width: 76%;"></div>
      </div>
    </div>
    <div class="we-star-bar-graph__row">
      <span class="we-star-bar-graph__stars we-star-bar-graph__stars--4"></span>
      <div class="we-star-bar-graph__bar">
        <div class="we-star-bar-graph__bar__foreground-bar" style="width: 12%;"></div>

Is there any way to extract the output as an image? Thanks for the help!

  • Do you want the number of 5 start reviews or %? – supputuri Jul 24 '19 at 21:47
  • Hopefully I can get both images. The xpath is for the %. – Arthur Morgan Jul 24 '19 at 21:56
  • There is not image source for these tags. You can take a selenium screenshot with coordinates around these elements. take a look at this solutions - https://stackoverflow.com/questions/13832322/how-to-capture-the-screenshot-of-a-specific-element-rather-than-entire-page-usin – Sureshmani Kalirajan Jul 24 '19 at 22:29
  • You don't really want an image though, right? You just want the info that the "image" contains. You'd be better served (and it would be WAY faster) if you just parsed out the class names and styles that actually contain the star values, e.g. `we-star-bar-graph__stars--5` at 76% and `we-star-bar-graph__stars--4` at 12%. Dump that all into a sheet and it's much faster to process than dumping a ton of screenshots and having to sort it all out by opening each one manually, examining it, and then recording the values. – JeffC Jul 24 '19 at 23:24
  • @JeffC Is there any better way to extract the contents in the "image"? becuase my outputs contain a lot of
    – Arthur Morgan Jul 25 '19 at 13:28
  • @ArthurMorgan I added an answer explaining because there's not enough room in a comment. – JeffC Jul 25 '19 at 15:47

2 Answers2

3

As I suggested in my comment, I think a better/faster approach would be to just get the values instead of taking a screenshot. If you take a screenshot, someone will have to manually open it up and then record the values from the screenshot in some other format which is going to be a long and tedious process. Instead, just scrape the data from the page and dump it in the final desired format.

For example, if you look at the HTML for just the 5-star rating bar

<div class="we-star-bar-graph__row">
    <span class="we-star-bar-graph__stars we-star-bar-graph__stars--5"></span>
    <div class="we-star-bar-graph__bar">
        <div class="we-star-bar-graph__bar__foreground-bar" style="width: 76%;"></div>
    </div>
</div>

You can see that there's a class applied, we-star-bar-graph__stars--5, that indicates what star rating it is. You can also see that the width of the bar is set, style="width: 76%;", so that tells you the % of 5-star ratings. With that info, we can scrape the rating for each star.

ratings = driver.find_elements_by_css_selector("figure.we-star-bar-graph div.we-star-bar-graph__bar__foreground-bar")
# get the width of the entire bar
width = float(driver.find_elements_by_css_selector(".we-star-bar-graph__bar").value_of_css_property("width"))[:-2])
for i in range(len(ratings), 0, -1) :
    # get the width of the rating
    rating = float(ratings[len(ratings) - i].value_of_css_property("width")[:-2])
    print(str(i) + "-star rating: " + str(rating / width * 100) + "%")

This should dump values like

5-star rating: 76%
4-star rating: 12%
3-star rating: 4%
2-star rating: 1%
1-star rating: 6%

That might not be your final desired format but it should get you pointed in the right direction.

JeffC
  • 22,180
  • 5
  • 32
  • 55
  • Thank you so much! Your answer solved my question so well! – Arthur Morgan Jul 25 '19 at 15:55
  • I applied the code above but received "TypeError: must be str, not WebElement". Not sure why I got the error... – Arthur Morgan Jul 25 '19 at 16:02
  • Thanks for the update. But I got a different output: "5-star rating: 210.275px 4-star rating: 33.2px 3-star rating: 11.0625px 2-star rating: 2.7625px 1-star rating: 16.6px" – Arthur Morgan Jul 25 '19 at 16:37
  • BTW, how can I see the outputs from the "find_elements_by_css_selector"? – Arthur Morgan Jul 25 '19 at 16:38
  • I think I've fixed it... I don't have python locally so I was running things from the browser console and it was returning the %s. I updated the code to get the full width of the bar and then do the math to calculate the %. I think I've got it but you'll have to test it and let me know. – JeffC Jul 25 '19 at 17:19
  • To see the output from the finds, in Chrome use `$$("figure.we-star-bar-graph div.we-star-bar-graph__bar__foreground-bar")` in the dev console. You can then expand the return and see each one. As you hover over each return, they will be highlighted on the page or if you click on one, that element will be selected in the DOM. `$$()` is for CSS selectors and `$x()` is for XPaths. – JeffC Jul 25 '19 at 17:21
2

Open the webpage and scroll to the element by id, as I checked the id, it is "ember290" for the part you want in the webpage.

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import pyscreenshot as ImageGrab

browser = webdriver.Chrome()  # we are using chrome as our webbrowser

browser.get('https://apps.apple.com/us/app/mercer-marketplace-benefits/id1041417557')
#rating_distr = WebDriverWait(browser,30).until(EC.presence_of_element_located((By.XPATH, """(//*[@id="ember290"]/div/div[2])""")))

ActionChains(browser).move_to_element(browser.find_element_by_id('ember290')).perform()

im=ImageGrab.grab()
im.show()

im=ImageGrab.grab(bbox=(162,650,500,500))
im.show()

ImageGrab.grab_to_file('im.png')

Take a screenshot once the scrolling is done.

Vignesh SP
  • 451
  • 6
  • 15