-3

The objective of my backend service is to process 90 milllion data and at least 10 million of data in 1 day.

My system config:

  • Ram 2000 Mb
  • CPU 2core(s)

what I am doing right now is something like this:

var wg sync.WaitGroup
//length of evs is 4455
for i, ev := range evs {
                wg.Add(1)
                go migrate(&wg)
            }
wg.Wait()

func migrate(wg *sync.WaitGroup) {
defer wg.Done()
//processing 
time.Sleep(time.Second)
}
James Z
  • 12,209
  • 10
  • 24
  • 44
  • the resource is not the issue. can also increase the resource – apoorva kumar Feb 11 '21 at 05:37
  • Your approach is the correct one, you just need to limit the rate of goroutine creation. The actual number of goroutines you want around depends on the nature of your task. https://pkg.go.dev/golang.org/x/sync/semaphore – oakad Feb 11 '21 at 06:14
  • 1
    And you definitely don't need that `sleep` there. – oakad Feb 11 '21 at 06:15
  • [goroutines are about 4K each](https://stackoverflow.com/questions/8509152/max-number-of-goroutines), there shouldn't be a problem making them all at once. If they're spending a lot of time waiting for a resource, like a network call, that should be fine. But if they're CPU bound with two cores you won't get much benefit beyond maybe 5 goroutines, and the swapping between 4500 goroutines might harm performance. Similar problem with I/O, you can't go faster than your disk. What is migrate doing? – Schwern Feb 11 '21 at 08:07
  • migrate is actually doing some processing on the data and saving to the DB. That make sense . So If I limit the go routine at 5 at once then I can check the benchmark of processing – apoorva kumar Feb 11 '21 at 08:28

2 Answers2

0

Without knowing more detail about the type of work you need to do, your approach seems good. Some things to think about:

  • Re-using variables and or clients in your processing loop. For example reusing an HTTP client instead of recreating one.

  • Depending on how your use case calls to handle failures. It might be efficient to use erroGroup. It's a convenience wrapper that stops all the threads on error possibly saving you a lot of time.

  • In the migrate function be sure to be aware of the caveats regarding closure and goroutines.

func main() {
    g := new(errgroup.Group)
    var urls = []string{
        "http://www.someasdfasdfstupidname.com/",
        "ftp://www.golang.org/",
        "http://www.google.com/",
    }
    for _, url := range urls {
        url := url // https://golang.org/doc/faq#closures_and_goroutines
        g.Go(func() error {
            resp, err := http.Get(url)
            if err == nil {
                resp.Body.Close()
            }
            return err
        })
    }

    fmt.Println("waiting")
    if err := g.Wait(); err == nil {
        fmt.Println("Successfully fetched all URLs.")
    } else {
        fmt.Println(err)
    }
}
martinni39
  • 323
  • 3
  • 10
-1

I have got the solution. to achieve this much huge processing what I have done is a limited number of goroutine to 50 and increased the number of cores from 2 to 5.