Questions tagged [go-colly]

colly is a web scraping framework written in Go. Import it as https://github.com/gocolly/colly. You will typically use this tag together with the main tag [go].

63 questions
4
votes
1 answer

Go Colly not returning any data from website

I am trying to make a simple web scraper in go and I can't seem to get the most simple functionality from colly. I took the basic example from the colly docs and while it worked with the hackernews.org site they used it isn't working with the site I…
Cade
  • 89
  • 1
  • 8
3
votes
1 answer

add colly package output text to map in golang

i was making a web scraper with colly package, where it collects the ContestName and ContestTime from a website and make a json file. so i did like this Contests := make(map[string]map[string]map[string]map[string]string) …
3
votes
1 answer

Get values from same class name values in colly web scraping

i am working on small web scraping application using go language and colly web scraping framework which is built in Go here is the html code of website
Dinesh s
  • 313
  • 4
  • 19
3
votes
0 answers

Passing cookies from Go Rod (Headless browser) to requests, Colly cookiejar

I am trying to pass cookies from a headless browser in golang to the requests package cookiejar. There are some JS generated cookies that I need to grab using the headless broswer and then pass to the requests module. I currently have this to export…
AntBox
  • 31
  • 1
3
votes
1 answer

How to use selectors properly

I'm writing a crawler to retrieve some data from some pages, the logic of how to build it is very clear for me but I am very confused in how to use the selectors properly. I would like to get the title of some news using colly, I went to the page…
MrByte
  • 97
  • 1
  • 10
2
votes
1 answer

how to ignore printing Max depth limit reached go colly

i have a go colly crawler that i am trying to crawl many sites . on my terminal it prints a lot of : 2023/05/30 02:22:56 Max depth limit reached 2023/05/30 02:22:56 Max depth limit reached 2023/05/30 02:22:56 Max depth limit reached 2023/05/30…
Farshad
  • 1,830
  • 6
  • 38
  • 70
2
votes
1 answer

Scraping all possible tags and putting them into one variable using Go Colly

I need to scrape different tags from a list of sites, put in variable and then put them in a .csv list. For example, all lines where the author of the article is mentioned (div.author, p.author etc). On all sites, the location of this line and the…
2
votes
1 answer

Max Rate limit of StackOverflow

I have been trying to access StackOverflow with the amount of 30 requests / second but it not working. It has been blocked after a few seconds. Although the document of StackOverflow said the max rate limit of StackExchange is 30 req /s. The…
2
votes
1 answer

Web scrapping using Golang Colly, How to handle XML path not found?

I am using Colly for scrapping an ecommerce website. I will loop over many products. Here is a snippet of my code getting a sub-title c.OnXML("/html/body/div[4]/div/div[3]/div[2]/div/div[1]/div[3]/div/div/h1/1234", func(e *colly.XMLElement) { …
Chau Loi
  • 1,106
  • 1
  • 14
  • 36
2
votes
1 answer

Go Colly how to find requested element?

I'm trying to get specific table to loop through its content using colly but table its not being recognized, here's what I have so far. package main import ( "fmt" "github.com/gocolly/colly" ) func main() { c :=…
Lynx
  • 105
  • 9
2
votes
1 answer

How do I scrape TLS certificates using go-colly?

I am using Colly to scrape a website and I am trying to also get the TLS certificate that the site is presenting during the TLS handshake. I looked through the documentation and the response object but did not find what I was looking for. According…
2
votes
1 answer

Go Colly parallelism decreases the number of links scraped

I am trying to build a web scrapper to scrape jobs from internshala.com. I am using go colly to build the web scrapper. I visit every page and then visit the subsequent links of each job to scrape data from. Doing this in a sequential manner scrapes…
Adnan
  • 88
  • 1
  • 7
2
votes
0 answers

Web scraping site using polymerjs / webcomponent

I'm using colly to web scrape youtube charts. This site use polymerjs and as a result, I'm having issues to capture the DOM elements. A simple test I did was document.querySelector("#search-native") on console, and it's returning null. I saw an…
Jess
  • 53
  • 1
  • 1
  • 5
2
votes
1 answer

What can the go-colly library do?

Can the go-colly library crawl all HTML tags and text content under a div tag? If so, how? I can get all texts under a div tag. Like this: c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) { text =…
N Fx
  • 41
  • 3
2
votes
1 answer

Parsing nested elements using go-colly scraper

I'm using go-colly to scrape data from a webpage: I'm unable to parse out the src image from this nested HTML element. c.OnHTML(".result-row", func(e *colly.HTMLElement) { qoquerySelection := e.DOM …
Ryan
  • 1,102
  • 1
  • 15
  • 30
1
2 3 4 5