2

I am trying to create a http client program in go which would make many http GET requests. I am using a buffered channel to limit the number of concurrent requests.

When I run the program I get

Get http://198.18.96.213/: dial tcp 198.18.96.213:80: too many open files

Here is my program:

package main

import (
    "fmt"
    "net/http"
    "time"
)

func HttpGets(numRequests int, numConcurrent int, url string) map[int]int {
    // I want number of requests with each status code
    responseStatuses := map[int]int {
        100: 0, 101 : 0, 102 : 0, 200 : 0, 201 : 0, 202 : 0, 203 : 0, 204 : 0, 205 : 0, 
        206 : 0, 207 : 0, 208 : 0, 226 : 0, 300 : 0, 301 : 0, 302 : 0, 303 : 0, 304 : 0, 
        305 : 0, 306 : 0, 307 : 0, 308 : 0, 400 : 0, 401 : 0, 402 : 0, 403 : 0, 404 : 0, 
        405 : 0, 406 : 0, 407 : 0, 408 : 0, 409 : 0, 410 : 0, 411 : 0, 412 : 0, 413 : 0, 
        414 : 0, 415 : 0, 416 : 0, 417 : 0, 421 : 0, 422 : 0, 423 : 0, 424 : 0, 425 : 0, 
        426 : 0, 427 : 0, 428 : 0, 429 : 0, 430 : 0, 431 : 0, 500 : 0, 501 : 0, 502 : 0, 
        503 : 0, 504 : 0, 505 : 0, 506 : 0, 507 : 0, 508 : 0, 509 : 0, 510 : 0, 511 : 0, 
    }

    reqsDone := 0
    ch := make(chan *http.Response, numConcurrent)
    for i := 0; i < numRequests; i++ {
        go func(url string) {
            client := &http.Client{}
            req, reqErr := http.NewRequest("GET", url, nil)
            if reqErr != nil {
                fmt.Println(reqErr)
            }
            // adding connection:close header hoping to get rid 
            // of too many files open error. Found this in http://craigwickesser.com/2015/01/golang-http-to-many-open-files/           
            req.Header.Add("Connection", "close") 

            resp, err := client.Do(req)
            if (err !=nil) {
                fmt.Println(err)
            }
            ch <- resp

        }(url)
    }

    for {
        select {
        case r := <-ch:
            reqsDone += 1 // probably needs a lock?
            status := r.StatusCode          
            if _, ok := responseStatuses[status]; ok {
                responseStatuses[status] += 1           

            } else {
                responseStatuses[status] = 1
            }
            r.Body.Close() // trying to close body hoping to get rid of too many files open error 
            if (reqsDone == numRequests) {
                return responseStatuses
            }    
        }
    }
    return responseStatuses
}

func main() {
    var numRequests, numConcurrent = 500, 10
    url := "http://198.18.96.213/"
    beginTime := time.Now()
    results := HttpGets(numRequests, numConcurrent, url)
    endTime := time.Since(beginTime)
    fmt.Printf("Total time elapsed: %s\n", endTime)
    for k,v := range results {
        if v!=0 {
            fmt.Printf("%d : %d\n", k, v)
        }       
    }

}

How to I ensure files/socets are closed so that I don't get this error when making multiple requests?

Bharat
  • 2,960
  • 2
  • 38
  • 57
  • @Bravada Zadada, how to I make `numRequests` with only `numConcurrent` being concurrent while the rest waits? I thought this can be accomplished by limiting the channel size so that when channel is full, it'll block until it becomes empty. – Bharat Jul 20 '15 at 23:43
  • 1
    @Bharat the problem is your ` go func(url string) {}` will keep opening connections and block on sending the resp, so you will have few hundred open connections in the same time until your reader starts closing them. – OneOfOne Jul 20 '15 at 23:50
  • @OneOfOne, then how do I implement the goroutine for making requests? I am new to go and still not very clear about the way concurrently works. – Bharat Jul 20 '15 at 23:54
  • @Bharat I added an example, you will want to spawn X amount of processing goroutines and use a channel to send them the urls to process. – OneOfOne Jul 21 '15 at 00:07
  • BTW (unrelated to your question which others have answered): You don't need a lock on `reqsDone` since it's only used by a single goroutine (if instead you had it in the goroutine doing the requests then you could use `sync/atomic` to increment and load the counter safely). There is no need to initialize the map and check for key existence; just do `rs := make(map[int]int)` and `rs[status]++` ([example](https://play.golang.org/p/kGkOd3lVOd)). – Dave C Jul 21 '15 at 16:40
  • @DaveC, thanks for letting me know about using locks. I am trying to understand golang and wrote this program to play around with concurrency.Lots of new concepts which I'm not used to :) And I initialized the map because I got an error, `panic: assignment to entry in nil map` when I tried to increment by doing `responseStatuses[status] += 1`. – Bharat Jul 21 '15 at 17:44
  • @Bharat, I should have said, no need to initialize the map with data. You can just initialize it as an empty map. Unless you explicitly check for existence you get back the zero value which makes things like `if boolMap[key]`, `sliceMap[key] = append(sliceMap[key], value)`, and `counterMap[key]++` all work well. Unless you really need/want a zero valued entry in the map for some key(s). – Dave C Jul 22 '15 at 13:44

2 Answers2

1

Basically you were spawning 100s of goroutines that will start the connection the block until they are closed.

Here's a quick (and very ugly) working code:

var (
    responseStatuses = make(map[int]int, 63)
    reqsDone         = 0

    urlCh = make(chan string, numConcurrent)
    ch    = make(chan *http.Response, numConcurrent)
)
log.Println(numConcurrent, numRequests, len(responseStatuses))
for i := 0; i < numConcurrent; i++ {
    go func() {
        for url := range urlCh {
            client := &http.Client{}
            req, reqErr := http.NewRequest("GET", url, nil)
            if reqErr != nil {
                fmt.Println(reqErr)
            }
            // adding connection:close header hoping to get rid
            // of too many files open error. Found this in http://craigwickesser.com/2015/01/golang-http-to-many-open-files/
            req.Header.Add("Connection", "close")

            resp, err := client.Do(req)
            if err != nil {
                fmt.Println(err)
            }
            ch <- resp
        }

    }()
}
go func() {
    for i := 0; i < numRequests; i++ {
        urlCh <- url
    }
    close(urlCh)
}()

playground

OneOfOne
  • 95,033
  • 20
  • 184
  • 185
  • Thanks! In the 2nd goroutine, you are writing url into `urlCh` and in the 1st goroutine you are reading those urls from the `urlCh`. The size of `urlCh` is `numConcurrent` which is 10 but the for loop in the 2nd goroutine iteration till 500.So after every 10 urls written, that goroutine would do nothing until these are consumed by the 1st goroutine. And the 1st goroutine would be running until this channel is closed. Is this right? – Bharat Jul 21 '15 at 05:28
  • @Bharat the 2nd goroutine would block until there's room for it to push another url. – OneOfOne Jul 21 '15 at 21:17
0

You can use the following library:

Requests: A Go library for reduce the headache when making HTTP requests (20k/s req)

https://github.com/alessiosavi/Requests

The idea is to allocate a list of request, than send them with a configurable "parallel" factor that allow to run only "N" request at time.

// This array will contains the list of request
var reqs []requests.Request

// N is the number of request to run in parallel, in order to avoid "TO MANY OPEN FILES. N have to be lower than ulimit threshold"
var N int = 12

// Create the list of request
for i := 0; i < 1000; i++ {
    // In this case, we init 1000 request with same URL,METHOD,BODY,HEADERS 
    req, err := requests.InitRequest("https://127.0.0.1:5000", "GET", nil, nil, true) 
    if err != nil {
        // Request is not compliant, and will not be add to the list
        log.Println("Skipping request [", i, "]. Error: ", err)
    } else {
        // If no error occurs, we can append the request created to the list of request that we need to send
        reqs = append(reqs, *req)
    }
}

At this point, we have a list that contains the requests that have to be sent. Let's send them in parallel!

// This array will contains the response from the givens request
var response []datastructure.Response

// send the request using N request to send in parallel
response = requests.ParallelRequest(reqs, N)

// Print the response
for i := range response {
    // Dump is a method that print every information related to the response
    log.Println("Request [", i, "] -> ", response[i].Dump())
    // Or use the data present in the response
    log.Println("Headers: ", response[i].Headers)
    log.Println("Status code: ", response[i].StatusCode)
    log.Println("Time elapsed: ", response[i].Time)
    log.Println("Error: ", response[i].Error)
    log.Println("Body: ", string(response[i].Body))
}

You can find example usage into the example folder of the repository.

SPOILER:

I'm the author of this little library

alessiosavi
  • 2,753
  • 2
  • 19
  • 38