1

I have tried many attempts and all fail to record the data I need in a reliable and complete manner. I understand the extreme basics of python and selenium for automating simple tasks but in this case the content is dynamically generated and I am unable to find the correct way to access and subsequently record all the data I need.

The URL I am looking to scrape content from is structured similar to the following:

https://dutchie.com/embedded-menu/revolutionary-clinics-somerville/menu

In particular I am trying grab all info using something like -

browser.find_elements_by_xpath('//*[@id="products-container"]

Is this the right approach? How do I access specific sub elements of this element (and all elements of the same path)

I have read that I might need beautifulsoup4, but I am unsure the best way to approach this.

Would the best approach be to use xpaths? If so is there a way to iterate through all elements and record all the data within or do I have to specify each and every data point that I am after?

Any assistance to point me in the right direction would be extremely helpful as I am still learning and have hit a roadblock in my progress.

My end goal is a list of all product names, prices and any other data points that I deem relevant based on the specific exercise at hand. If I could find the correct way to access the data points I could then store them and compare/report on them as needed.

Thank you!

T0ne
  • 91
  • 7
  • 1
    Check approaches here https://stackoverflow.com/questions/67148905/python-web-scraping-for-walmart/67161826#67161826, and https://stackoverflow.com/questions/67165356/feed-dataframe-with-webscraping/67166294#67166294 It's common question. – vitaliis Apr 30 '21 at 23:23
  • This is a great start. I am getting lost at how I would select certain elements in my example, if the text I was after was contained in a DIV with the class of "product-information__Title-sc-65h5ke-4 eBIyJW" how would I approach this assuming the text at the end changes for instance? – T0ne May 01 '21 at 01:00
  • It's a different question and should be asked separately. Usually locators should be unique. – vitaliis May 01 '21 at 01:12

1 Answers1

1

I think you are looking for something like

browser.find_elements_by_css_selector('[class*="product-information__Title"]')

This should find all elements with a class beginning with that string.

C. Peck
  • 3,641
  • 3
  • 19
  • 36