I'm trying to rewrite a web crawler in go (originally written in python with gevent). But I've hit a wall, no matter what I do I get fast high memory consumption. For example, the following simple code:
package main
import (
"bufio"
"fmt"
"os"
"net/http"
"io"
"strings"
"time"
)
func readLine(in *bufio.Reader, domains chan<- string) {
for conc := 0; conc < 500; conc++ {
input, err := in.ReadString('\n')
if err == io.EOF {
break
}
if err != nil {
fmt.Fprintf(os.Stderr, "read(stdin): %s\n", err)
os.Exit(1)
}
input = strings.TrimSpace(input)
if input == "" {
continue
}
domain := input
domains <- domain
}
}
func get(domains <-chan string) {
url := <-domains
URLresp, err := http.Get(url)
if err != nil {
fmt.Println(err)
}
if err == nil {
fmt.Println(url," OK")
URLresp.Body.Close()
}
}
func main() {
domains := make(chan string, 500)
inFile, _ := os.Open("F:\\PATH\\TO\\LIST_OF_URLS_SEPARATED_BY_NEWLINE.txt")
in := bufio.NewReader(inFile)
for {
go readLine(in, domains)
for i := 0; i < 500; i++ { go get(domains) }
time.Sleep(100000000)
}
}
I've tried pprof but it seems to say I'm using 50mb of heap space only, while memory consumption by resource monitoring is skyrocketing.
I also tried creating a custom http Transport without Keep Alive since I found out net/http saves connections for reuse but no luck with that.