Using IMPORTXML to pull an data from within a form

Question

I am trying to pull some content with IMPORTXML but my inexperience with XPATH is hindering me. Specifically, I am trying to pull the image and the description of a link preview that appears in linkfork.co. As an example, https://linkfork.co/preview?url=https%3A%2F%2Ftechcrunch.com%2F2021%2F06%2F03%2Fford-owned-spin-shakes-up-scooter-business-with-new-ceo-e-bikes-and-city-strategy%2F returns the following (I want what is in the red circle): example

Chrome's dev tools give the following XPATH for the image: //*[@id="image-container"]/img

This seems to be correct, as far as I can tell, but IMPORTXML returns the error "Imported content is empty"...

I tried a bunch of variations on the formula and all return the same error as long as I am trying to pull content from under "div class = p-4". Just to be sure I am not totally off-mark, I tried the following function to try and pull every bit of text I could from the page:

=IMPORTXML("https://linkfork.co/preview?url=https%3A%2F%2Ftechcrunch.com%2F2021%2F06%2F03%2Fford-owned-spin-shakes-up-scooter-business-with-new-ceo-e-bikes-and-city-strategy%2F","//*[text()]")

And it it indeed pulled all text from the page except what was under that div. I am just guessing here, but is it because that content is within an editable form? Is it possible to pull data from it? Any help is appreciated.

It seems web scraping is not allowed in the site that you are targeting. I tried to access `https://linkfork.co/robots.txt` and the result is `User-agent: * Disallow:` which means web scraping is blocked as mentioned in this reference [post](https://www.quora.com/How-do-you-know-if-web-scraping-a-particular-website-is-legally-okay) — Ron M, Jun 04 '21 at 21:13
You can try exploring apps script to parse your target site. Here is a sample reference that might help you. [What is the best way to parse html in google apps script](https://stackoverflow.com/questions/19455158/what-is-the-best-way-to-parse-html-in-google-apps-script) — Ron M, Jun 04 '21 at 21:50
Thanks a ton again! I was actually trying to avoid g script for a bunch of reasons that are irrelevant here. At any rate, by changing my approach and not relying on link preview builders I managed to get what I wanted (tho my solution isn't scalable to every site ever, I have a limited set of sources). It was a good learning exercise. — Maldoror, Jun 05 '21 at 22:11

Using IMPORTXML to pull an data from within a form

0 Answers0