1

I am writing a web crawler to learn go

My current implementation uses 10 go routines to get websites, I want to limit the number of times I can hit a hostname every second.

What is the best (thread-safe) approach to do this.

user2348668
  • 768
  • 7
  • 19
  • 1
    Please post some example of your code – Jose Bagatelli Sep 30 '16 at 22:40
  • 1
    see: How do I execute commands many times per second in Golang: http://stackoverflow.com/questions/39385883/how-do-i-execute-commands-many-many-times-per-second-in-golang –  Oct 01 '16 at 07:35

1 Answers1

1

A channel provides a concurrent synchronization mechanism you can use to coordinate with. You could use one in coordination with a time.Ticker to periodically dispatch a given number of function calls.

// A PeriodicResource is a channel that is rebuffered periodically.
type PeriodicResource <-chan bool

// The NewPeriodicResourcePool provides a buffered channel that is filled after the
// given duration. The size of the channel is given as count. This provides
// a way of limiting an function to count times per duration.
func NewPeriodicResource(count int, reset time.Duration) PeriodicResource {
    ticker := time.NewTicker(reset)
    c := make(chan bool, count)

    go func() {
        for {
            // Await the periodic timer
            <-ticker.C

            // Fill the buffer
            for i := len(c); i < count; i++ {
                c <- true
            }
        }
    }()

    return c
}

A single go routine waits for each ticker event and attempts to fill a buffered channel to max capacity. If a consumer does not deplete the buffer any successive tick only refills it. You can use the channel to synchronously perform an action at most n times per duration. For example, I may want to call doSomething() no more than five times per second.

r := NewPeriodicResource(5, time.Second)
for {
        // Attempt to deque from the PeriodicResource
        <-r

        // Each call is synchronously drawing from the periodic resource
        doSomething()
}

Naturally, the same channel could be used to call go doSomething() which would fan out at most five processes per second.

Ben Campbell
  • 4,298
  • 2
  • 29
  • 33