0

MY CONTEXT:

I'm on this personal project to automate a relatively complex and inter-related workflow across two different and independent "3rd party service" websites by developing a Chrome extension under MV3; this includes site interaction, along with Dropbox and GoogleSheets read & write interactions.

In a simplified nutshell, my Chrome MV3 extension:

  1. monitors a personal Dropbox folder;
  2. triggers new requests to the FIRST website based on new detected files in Dropbox;
  3. monitors and detects changes in those requests in the FIRST website;
  4. triggers related new requests and tracks corresponding changes across the SECOND website until full completion;

My extension maintains a log of all these interactions and current request status in a Google Sheet.

In both websites, monitoring changes in the requests requires navigating to a specific "history" area -with potentially multiple pages-, "scraping" information on a per-request basis, comparing and updating this new information with the previous information as maintained in my Google Sheets.

I first designed the "history scraping and analysis" work by having the Chrome browser navigate and flick through the history pages; I got this working, although I did not like the approach as I felt the visual "flicking" of the multiple pages did not provide an optimal user experience.

So, in order to provide a smoother user experience I decided to visually hide the "history scraping and analysis" work from the user by:

  1. "fetching" (with new XMLHttpRequest()) each of the history pages into a "DOM" variable in the background
  2. performing the DOM scraping from the variable that contains the DOM
  3. just informing the user what was going on with messages (like "analysing requests on page 1", "analysing requests on page 2", etc...)

This is working PERFECTLY on my FIRST website.

MY PROBLEM:

The SECOND website is surely designed differently.

From my research, I think the second website might be built with Angular, but I do not have any certainty or knowledge of that.

PLEASE NOTE: I cannot provide direct access to the site as it is user/password protected, so I'm attaching relevant screenshots

When I have Chrome physically navigate to the "history area" and flick through the page(s), they do render "fully" in the browser;

Specifically, the information I need to scrap seems to be under a custom tag; in any case, these contents are visible through "Developer Tools" console, and accessible with standard "document.getElement(s).*" commands in my scripts so I can scrape all the information I need.

sample site screenshot with DevTools

HOWEVER:

When I "fetch" the website pages(s) in my script into a variable (using new XMLHttpRequest()), and although the DOM document does actually get stored in the variable, the DOM contents show the tag -where I believe all the history contents should be- as empty, so I cannot scrape anything.

enter image description here

I then also realised that even when I physically navigate in the Chrome browser to the actual history page (as described in the details of my problem), what I see in the PAGE SOURCE -available through View->Developer->View Source does NOT show any contents inside the custom tag, which is empty

So I understand there seems to be a difference between what the browser displays and the actual page source (which it's also what is retrieved in the "fetch" script action), but I do not know how to access the actual content that the browser displays.

MY QUESTIONS::

  1. Is there anything I can do to force those "history" contents included in the "fetch" call and stored in my script variable?
  2. Do I need to / can I trigger any of those scripts in the page to feed them into the variable? If so, can you indicate which script(s)?
  3. Or is it my only resort to physically "flick" through the pages in the Chrome browser as I initially did?

I'm just a javascript newbie, coming up from a mediocre python developer, so please excuse my ignorance

JAM
  • 1
  • 2
  • No, this approach won't work with js-based sites. The solution is to run sites inside an iframe inside the [offscreen document](https://developer.chrome.com/docs/extensions/reference/offscreen/). Your content script should have [all_frames](https://developer.chrome.com/docs/extensions/mv3/content_scripts/#frames) to run inside the iframe. You might also need to [strip the DENY header](/a/69177790). – wOxxOm Aug 09 '23 at 04:42

0 Answers0