3

i am working on small web scraping application using go language and colly web scraping framework which is built in Go

here is the html code of website

<div clas="cc">  
    <div class="list">
        <span class="countrybg" style="background-image: url(countryimage);"></span>
        <span class="continet">Asia</span>
        <span class="country">india</span>
    </div>
    <div class="list">
        <span class="countrybg" style="background-image: url(countryimage);"></span>
        <span class="continet">Africa</span>
        <span class="country">Brazil</span>
    </div>
</div>   

now i want to fetch all the three span elements one by one and append to array

i tried with this code but it does not work but it return as AsiaAfrica
but i want the values separately and want to fetch the image url of countrybg class

c := make([]string, 10) 
element.ForEach(".list span", func(_ int, elem *colly.HTMLElement) {
            result := element.ChildText("span:nth-child(2)")
            c = append(c, result)
})

the example output should be like

countrybg = ['image1url' ,'image2url']
continet = ['Asia' ,'Africa']
country = ['india' ,'Brazil']

can any one help to get this

Dinesh s
  • 313
  • 4
  • 19
  • I don't actually know how colly works, but it looks like you're using `element` within the `element.ForEach` callback. Maybe you should use `elem` – Alper Oct 22 '21 at 22:05

1 Answers1

0

I ran a local server on port 8081 and tried getting the values you are looking for. There are many ways to do what you need, this is just one:

package main

import (
    "fmt"
    "regexp"

    "github.com/gocolly/colly"
)

func main() {
    c := colly.NewCollector()

    countrybgs := []string{}
    continents := []string{}
    countries := []string{}

    r := regexp.MustCompile(`background-image: url\((.*)\);`)

    /*
        <div clas="cc">
            <div class="list">
                <span class="countrybg" style="background-image: url(image1url);"></span>
                <span class="continet">Asia</span>
                <span class="country">india</span>
            </div>
            <div class="list">
                <span class="countrybg" style="background-image: url(image2url);"></span>
                <span class="continet">Africa</span>
                <span class="country">Brazil</span>
            </div>
        </div>
    */

    c.OnHTML("span", func(e *colly.HTMLElement) {
        switch class := e.Attr("class"); class {
        case "countrybg":
            countrybgs = append(countrybgs, r.FindStringSubmatch(e.Attr("style"))[1])
        case "continet":
            continents = append(continents, e.Text)
        case "country":
            countries = append(countries, e.Text)
        }
    })

    c.Visit("http://localhost:8081")

    fmt.Println(countrybgs)
    fmt.Println(continents)
    fmt.Println(countries)
}

the output:

> go run .
[image1url image2url]
[Asia Africa]
[india Brazil]
jabbson
  • 4,390
  • 1
  • 13
  • 23
  • any other way to do i am trying this but it return empty array for me if you help with another way it will helpful – Dinesh s Oct 25 '21 at 08:45
  • if this is a publicly available page, you can share the page with us. – jabbson Oct 25 '21 at 11:17
  • i cannot able share the page , but the html code is same can you help me with other option – Dinesh s Oct 25 '21 at 12:10
  • How do you check for html code, in the browser or from the go code? This code should also work and if you run it the way I did you will see it works, so there has to be something different in either the code you are running or the html you are running the code against. – jabbson Oct 25 '21 at 12:12
  • i am checking only in browser , i will check one more time my code and get back to you if it not works – Dinesh s Oct 25 '21 at 12:20
  • When checking in browser, try disabling javascript in the developer's console and see if the content is still there – jabbson Oct 25 '21 at 12:24