0

I'm currently working on a project that scrapes grocery store pages for data given a search query (i.e., cereal) and display that in a Spinner view. However, I'm having some difficulty finding a way to scrape the data off the pages. I tried using Jsoup as that was the concensus online, but that doesn't support JavaScript.

The issue lies that most, if not all, sites like these use DOM storage for up-to-date stock listings and prices. That's why libraries like Jsoup won't work as they will return the HTML for no JavaScript. I currently have a prototype that displays the page via a WebView but I see no way of getting the data.

I've tried to research how to get around this but it's quite confusing to be quite honest to find an elegent solution, if that even exists.

If anyone can help, or at the very least point me in the right direction, that would be most appreciated! Thanks ^_^

trcon
  • 27
  • 4
  • What about moving the scraping to a server, using something like selenium to do the scraping and then have the android app interrogate your server for data? – Federico klez Culloca Nov 19 '21 at 10:32
  • Maybe this will help you https://stackoverflow.com/questions/17399055/android-web-scraping-with-a-headless-browser – Cherdenko Nov 19 '21 at 10:54
  • Try this answer. It should give you the idea on how to acces the data loaded in the background with JavaScript. https://stackoverflow.com/questions/66518872/unable-to-retrieve-table-elements-using-jsoup/66519504#66519504 – Krystian G Nov 19 '21 at 22:18

1 Answers1

0

Selenium would be a good option for web scraping. https://www.selenium.dev/ It basically has access to the website's DOM. In past experience, a dynamically generated web page can be difficult to scrape. RegExp will be your friend. https://regexone.com/

LoginCodes
  • 16
  • 4