0

I am trying to pull some content with IMPORTXML but my inexperience with XPATH is hindering me. Specifically, I am trying to pull the image and the description of a link preview that appears in linkfork.co. As an example, https://linkfork.co/preview?url=https%3A%2F%2Ftechcrunch.com%2F2021%2F06%2F03%2Fford-owned-spin-shakes-up-scooter-business-with-new-ceo-e-bikes-and-city-strategy%2F returns the following (I want what is in the red circle): example

Chrome's dev tools give the following XPATH for the image: //*[@id="image-container"]/img

This seems to be correct, as far as I can tell, but IMPORTXML returns the error "Imported content is empty"...

I tried a bunch of variations on the formula and all return the same error as long as I am trying to pull content from under "div class = p-4". Just to be sure I am not totally off-mark, I tried the following function to try and pull every bit of text I could from the page:

=IMPORTXML("https://linkfork.co/preview?url=https%3A%2F%2Ftechcrunch.com%2F2021%2F06%2F03%2Fford-owned-spin-shakes-up-scooter-business-with-new-ceo-e-bikes-and-city-strategy%2F","//*[text()]")

And it it indeed pulled all text from the page except what was under that div. I am just guessing here, but is it because that content is within an editable form? Is it possible to pull data from it? Any help is appreciated.

Ken White
  • 123,280
  • 14
  • 225
  • 444
Maldoror
  • 7
  • 3
  • 1
    It seems web scraping is not allowed in the site that you are targeting. I tried to access `https://linkfork.co/robots.txt` and the result is `User-agent: * Disallow:` which means web scraping is blocked as mentioned in this reference [post](https://www.quora.com/How-do-you-know-if-web-scraping-a-particular-website-is-legally-okay) – Ron M Jun 04 '21 at 21:13
  • Thanks @RonM, I guess it's a dead end, then. – Maldoror Jun 04 '21 at 21:47
  • You can try exploring apps script to parse your target site. Here is a sample reference that might help you. [What is the best way to parse html in google apps script](https://stackoverflow.com/questions/19455158/what-is-the-best-way-to-parse-html-in-google-apps-script) – Ron M Jun 04 '21 at 21:50
  • Thanks a ton again! I was actually trying to avoid g script for a bunch of reasons that are irrelevant here. At any rate, by changing my approach and not relying on link preview builders I managed to get what I wanted (tho my solution isn't scalable to every site ever, I have a limited set of sources). It was a good learning exercise. – Maldoror Jun 05 '21 at 22:11

0 Answers0