2

I'm using go-colly to scrape data from a webpage:

enter image description here

I'm unable to parse out the src image from this nested HTML element.

    c.OnHTML(".result-row", func(e *colly.HTMLElement) {
        qoquerySelection := e.DOM
        fmt.Println(qoquerySelection.Find("img").Attr("src"))
...

This .result-row works for a lot of things like:

link := e.ChildAttrs("a", "href")

and

e.ChildText(".result-price")

How can I get the nested image src value?

Ryan
  • 1,102
  • 1
  • 15
  • 30

1 Answers1

0

If I understood correctly, my solution should manage your needs. First, let me present the code:

package main

import (
    "fmt"
    "strings"

    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector(colly.AllowedDomains(
        "santabarbara.craigslist.org",
    ))

    c.OnRequest(func(r *colly.Request) {
        r.Headers.Set("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36")
    })

    c.OnResponse(func(r *colly.Response) {
        fmt.Println("Response Code:", r.StatusCode)
    })

    c.OnHTML("img", func(h *colly.HTMLElement) {
        imgSrc := h.Attr("src")
        imgSrc = strings.Replace(imgSrc, "50x50c", "1200x900", 1)
        imgSrc = strings.Replace(imgSrc, "300x300", "1200x900", 1)
        imgSrc = strings.Replace(imgSrc, "600x450", "1200x900", 1)
        fmt.Println(imgSrc)
    })

    c.Visit("https://santabarbara.craigslist.org/apa/7570100710.html")
}

After selecting all of the images on the web page, you've to replace the icon format with the largest one (in our case 1200x900). I saw these formats in a script tag present near the bottom of the page.
The rest should be pretty straightforward. Let me know if this solves your issue or if you need something else, thanks!

ossan
  • 1,665
  • 4
  • 10