0

I cannot get an entire table to populate with ImportXML. At best I get the first column and I cannot figure this out.

The website I am trying to scrape is: https://classic.warcraftlogs.com/character/us/kromcrush/chills

Do I have any options to retrieve the table rather it be column by column or as a whole?

I have tried all the following plus several others.

=IMPORTXML("https://classic.warcraftlogs.com/character/us/kromcrush/chills","//table[@id='boss-table-1010']/tbody/tr")
=IMPORTXML("https://classic.warcraftlogs.com/character/us/kromcrush/chills","//tbody/tr")
=IMPORTXML("https://classic.warcraftlogs.com/character/us/kromcrush/chills","//tbody/tr/td")
=IMPORTXML("https://classic.warcraftlogs.com/character/us/kromcrush/chills","//tr")
=IMPORTXML("https://classic.warcraftlogs.com/character/us/kromcrush/chills","//tr/td")
=IMPORTXML("https://classic.warcraftlogs.com/character/us/kromcrush/chills","//tr/td[1]")
=IMPORTXML("https://classic.warcraftlogs.com/character/us/kromcrush/chills","//tr/td[2]")

Anything outside of column one says Imported content is empty. Please help!

P.S. I have scoured this website and google for answers and every case I find seems to be a syntax error, starting at the table itself doesn't return the entire table which tells me I need a clever method.

Kevin P.
  • 907
  • 7
  • 18

1 Answers1

0

It seems that's an issue with the website, because when you click on Inspect you can see the table with id "boss-table-1010" but if you click on View Source that ID is not available, so the table is dynamically rendered and Sheets doesn't find such id.

I've checked it and I can get the data by doing =IMPORTXML("https://classic.warcraftlogs.com/character/us/kromcrush/chills", "//table/tbody//td")

But if you want a more robust solution, it'll be better doing it programmatically by using Python for web scraping

David Salomon
  • 804
  • 1
  • 7
  • 24
  • You can get the entire table? I have ran that same formula many times and all it returns in the first column and none of the others. – Kevin P. Jan 11 '22 at 20:50
  • I'm not sure what happens with the website, but if you run `=IMPORTXML("https://www.w3schools.com/html/html_tables.asp","//table[@id='customers']/tbody/tr")` as an example, the same syntax works. I assuming is due to the fact the table doesn't change dynamically – David Salomon Jan 11 '22 at 21:16
  • I tried to do the same with `=IMPORTHTML("https://classic.warcraftlogs.com/character/us/kromcrush/chills","table",2)` but still, just getting the first column. I think you might need to use Google Apps Script to get the HTTP response and render data. – David Salomon Jan 11 '22 at 21:26
  • I did start playing around with the API but unfortunately it cannot return the elements in the table. You're suggesting there is a way to obtain the table in Apps Script with a HTTP response? – Kevin P. Jan 11 '22 at 21:45
  • 1
    I mentioned Apps Script because it's easier to program functions and return the results in a spreadsheet but tbh I haven't tried it, take a look at this thread on https://stackoverflow.com/questions/53683792/html-table-does-not-show-on-source-file they offer solutions like Selenium and more complicated stuff, and that's because of the way the website is designed. – David Salomon Jan 12 '22 at 01:03
  • 1
    I'm not familiar with Warcraft but I suppose they share that information via api, https://www.warcraftlogs.com/api/docs and as per their docs, they have the info available if you use GraphQL https://www.warcraftlogs.com/v2-api-docs/warcraft/ – David Salomon Jan 12 '22 at 01:06