2

I would like to import data from an HTML page (specifically, https://www.warzone.com/MultiPlayer/Tournament?ID=34581) into a Google Sheets sheet. Normally this would be accomplished using IMPORTHTML (or, in more hacky ways, IMPORTXML or IMPORTDATA).


I want to get the table contents from the Players tab on the URL above. In my sheet, this should look like (with |'s separating cells)

Accepted Player       | Team | Rank      | Wins | Losses | In-progress
JSA                   |      | 1st place | 6    | 0      | 0
hexe1309              |      | 2nd place | 5    | 1      | 0
Sebus Maximus         |      | 3rd place | 3    | 3      | 0
Master Jz             |      | 4th place | 2    | 4      | 0
Blortis               |      | 5th place | 2    | 4      | 0
Rex Paul of Lizardino |      | 6th place | 2    | 4      | 0
Benjamin628           |      | 7th place | 1    | 5      | 0

But when I run IMPORTHTML(<the URL>, "table", 2), I instead get just the top row:

Accepted Player       | Team | Rank      | Wins | Losses | In-progress

If I use IMPORTXML or IMPORTDATA, I still do not get any contents from the populated table- just the column headers.


If you view page source on the URL above, you'll notice that, not coincidentally, only the part of this table is populated. The issue is that the initial request made when querying the page of interest returns a template- a page with empty tables. So IMPORTHTML, IMPORTXML, and IMPORTDATA return a page with tables that have just the headers populated- but no contents.

The empty tables don't yet have the data that I want; that gets filled in by some jQuery code. Presumably that jQuery code is fetching the data from somewhere before populating the table, but I don't see any requests in my browser's Network tab corresponding to this. I don't understand how web applications work or how to find out where this data is being fetched from.

From my perspective, my options are:

  1. To somehow have Google Sheets (like a headless browser) fetch the data from after the jQuery logic has populated the table, rather than from the initial page returned by the GET request; OR

  2. To somehow run/replicate the functionality of whatever the jQuery is doing, and fetch that data myself

I'm guessing #2 is more viable, since IMPORTHTML/IMPORTXML/IMPORTDATA are not that sophisticated. For #2, I don't have the debugging know-how to trace this website's jQuery code (or to find the place where fetches the data to fill in the table). Are there any devtools (in Firefox or Chrome) I can use to find out what happens behind the scenes to fetch this data and replicate that logic in my sheet? (My main gap here is almost certainly my lack of understanding of how this data is being fetched and where to find the trace of this fetching.)

n.b.: This could probably be done very easily by just using a script to update the sheet. I'll probably end up doing this with Python and gspread. But I'd like to know if it's actually impossible/infeasible to accomplish this in Google Sheets alone.

TheMaster
  • 45,448
  • 6
  • 62
  • 85
  • What have you tried so far? show us the method you used in google sheets. Also you didn't specify the expected output. – Marios Sep 04 '20 at 08:02
  • Thanks so much @Marios; I've edited that in as well as the expected output. Please let me know if you see any other ways to make this question clearer, more specific, or more concrete. – garbledamphimorph Sep 04 '20 at 09:02
  • 1
    Contact them to see if they provide a endpoint api. If not and they're ok with scrapping, Try the network tab. See if there are specific requests to a api endpoint. – TheMaster Sep 04 '20 at 14:07
  • @TheMaster: the Network tab does not show any requests that query the data in the table. There are requests _afterward_ where some supplemental data (images associated with the fetched data) get loaded but nothing that actually queries the players table. I'm not familiar with web technologies so part of what I'm wondering is what non-request-based approaches there might be for the jQuery logic to be populating this table. – garbledamphimorph Sep 04 '20 at 18:54
  • The data seems to be directly loaded, but encryped, while javascript unencrypts it and populates the table. That's just a guess. Anyway, searching their wiki provided the api https://www.warzone.com/wiki/Query_game_API You can just use a custom function like `importjson` with the api. – TheMaster Sep 04 '20 at 19:11
  • Related: https://stackoverflow.com/a/34828465 – TheMaster Sep 04 '20 at 19:32

1 Answers1

0

Google Sheets can't be used as a headless browser because IMPORTDATA/IMPORTHTML/IMPORTXML only are able to see the source code pointed by the URL, they can't see linked resources and can't see the resulting DOM of the JavaScript execution by the load and other events.

Also Google Sheets can't run/replicate the functionally of jQuery without using Google Apps Script or the Google Sheets API and your favorite programming language.

If you are open to use Google Apps Script, as it was mentioned by TheMaster, the straightforward way is to use an API that offers the content that you are looking but you might also try to replicate the HTTP requests made by jQuery. This could be done by using the Google Apps Script URL Fetch service.

Rubén
  • 34,714
  • 9
  • 70
  • 166