0

This question has already been asked multiple times with good answers compiled at

Scraping data to Google Sheets from a website that uses JavaScript,

but I could not find a working answer to my problem.

This Xpath works in the browser :

//td[1]//span[contains(@class,"cluster")]/text()

Screenshots:

enter image description here

However Google sheets returns imported content empty:

Jquery Bubbling

jQuery:

JQery

From the Scraping data to Google Sheets from a website that uses JavaScript answer and cited questions it seems IMPORTXML cannot be used to solve my problem because the page loads jQuery.

The Xpath being correct, I don't know what else to try.

It seems from this answer scrape table using google app scripts a Google Apps Script solution could be used.

Is that approach still valid?

Or maybe a simpler one is available now someone could suggest?

Given that url:

https://www.onelook.com/thesaurus/?s=active

From these answers it seems parsing json could be a simpler solution, however I did not find a workable json file from searching with the developer tool network tab ( only this url was available):

is there a way to tell if there is json on a website

Scrape the current share price data from the ASX

Is there a way to get a single response from a text/event-stream without using event listeners?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Lod
  • 657
  • 1
  • 9
  • 30
  • 1
    I have to apologize for my poor English skill. Unfortunately, from your question, I couldn't understand your expected value from the URL of `https://www.onelook.com/thesaurus/?s=active`. Can I ask you about the detail of it? – Tanaike Aug 19 '23 at 12:42
  • All the values from the "zone1" spans table. `On`, `alive`, `existent`, `existing` etc. https://i.imgur.com/UeZDna6.png – Lod Aug 19 '23 at 12:47
  • Thank you for replying. From your reply, I proposed a sample script for retrieving your expected values. Please confirm it. If I misunderstood your question and that was not useful, I apologize. – Tanaike Aug 19 '23 at 13:00

1 Answers1

1

From your following reply,

All the values from the "zone1" spans table. "On", "alive", "existent", "existing" etc.

I understood that you want to retrieve each word like On", "alive", "existent", "existing" from a table of the URL.

When I saw your URL, it seems that the table is created by Javascript. In this case, unfortunately, the table cannot be directly retrieved using IMPORTXML and Google Apps Script. But, fortunately, it seems that the data for creating the table can be retrieved by other URL. And, it seems that the data is JSON data. From this situation, in this answer, I would like to propose a sample script for retrieving your expected values using Google Apps Script. So, how about the following sample script?

Sample script:

function myFunction() {
  const url = "https://www.onelook.com/api/words?ml=active&qe=ml&md=dpfcy&max=850&rif=1&k=olthes_r4";
  const res = UrlFetchApp.fetch(url);
  const obj = JSON.parse(res.getContentText());
  const values = obj.map(({ word }) => word);
  console.log(values); // ["active","on","alive","existent","existing","dynamic","hyperactive",,,]
}
  • When this script is run, the values of ["active","on","alive","existent","existing","dynamic","hyperactive",,,] are shown in the log. I thought that these values might be your expected values.

Note:

When you want to retrieve the table on Spreadsheet, how about the following sample script?

function SAMPLE() {
  const url = "https://www.onelook.com/api/words?ml=active&qe=ml&md=dpfcy&max=850&rif=1&k=olthes_r4";
  const res = UrlFetchApp.fetch(url);
  const obj = JSON.parse(res.getContentText());
  const values = obj.map(({ word }) => word).splice(1, 100);
  const v = [...Array(Math.ceil(values.length / 20))].map(_ => values.splice(0, 20));
  return v[0].map((_, col) => v.map((row) => row[col] || null));
}

When this script is used as a custom function, the following result is obtained. It seems that this is the same values with the HTML table.

enter image description here

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • thanks it works great. But just another question if possible. How if we'd want to get the `zone1`, `zone2`, `zone3` etc. text values separately? Is it possible to specify the tables separately? For example equivalent output from those `Xpaths` `//td[1]//span[contains(@class,"cluster")]/text()` `//td[2]//span[contains(@class,"cluster")]/text()` `//td[3]//span[contains(@class,"cluster")]/text()` etc. Thanks again! – Lod Aug 19 '23 at 13:23
  • 1
    @Lod Thank you for replying. I'm glad your issue was resolved. About your 2nd question of `But just another question if possible. How if we'd want to get the zone1, zone2, zone3 etc. text values separately? Is it possible to specify the tables separately? For example equivalent output from those Xpaths //td[1]//span[contains(@class,"cluster")]/text() //td[2]//span[contains(@class,"cluster")]/text() //td[3]//span[contains(@class,"cluster")]/text() etc.`, I would like to support you. But, I have to apologize for my poor English skill. Unfortunately, I cannot understand your 2nd question. – Tanaike Aug 19 '23 at 13:26
  • @Lod Can you provide your expected values of your 2nd question? – Tanaike Aug 19 '23 at 13:27
  • Amazing, thanks alot for the 2nd sprict too I just tested and it works great. I'll provide the expected values now. – Lod Aug 19 '23 at 13:29
  • 1
    @Lod Thank you for replying. Now, I have to away from here because I have to sleep. When I saw the expected value of your 2nd question tomorrow, I would like to try to think of a solution. I deeply apologize for my time difference and apologize that I will sleep. If you can permit me to sleep, I'm glad. – Tanaike Aug 19 '23 at 13:32
  • sure thanks again a lot for the great ideas, that helped a lot! Sure you next answer can wait tomorrow. Have a good night sleep! – Lod Aug 19 '23 at 13:42
  • 1
    @Lod Thank you for replying. I came back here. Can you provide the value you want to retrieve in your 2nd question? – Tanaike Aug 19 '23 at 23:34
  • thanks for the reply and sorry for my late return but I had to do some testing to come to the new specific idea. Here is the needed values output in Sheet "to_Columns" https://docs.google.com/spreadsheets/d/1od2KKauRoY4fpwBvGGKy3KsuTuxcdpqQMBZrcgWogGM/edit#gid=1932250953 Here is the values summary: https://pastecode.io/s/qnhu57rk Basically the idea would be to retrieve each zone values as in Sheet "Data_Scraped" And if possible all the values if some zone "Next results" button. For example all the 420 value in zone1 inseat of the 92 https://i.imgur.com/ZowmvCr.png – Lod Aug 20 '23 at 12:31
  • @Lod Thank you for replying. I have to apologize for my poor English skill. Unfortunately, from your reply, I cannot understand your 2nd question. But, I would like to support you. So, can you post it as a new question by including more information for understanding correctly? By this, it will help users including me think of a solution. If you can cooperate to resolve your 2nd question, I'm glad. Can you cooperate to do it? – Tanaike Aug 20 '23 at 23:15
  • Thanks @Tanaike for the answer. Sorry for my late reply, I was busy with a python project. For now I will study the json from the api and come back with a new question asap. I also found this while researching: https://github.com/wtetsu/mouse-dictionary and wanted to share. Till soon. – Lod Aug 26 '23 at 13:24