Why can't I speed up my program by concurrently computing different parts of a slice?

Question

I wrote a program to compute the similarity scores between a query and a target document. The sketch is like below:

type Dictionary struct {
    Documents map[int][]string
    Queries   map[int][]string
}

type Similarity struct {
    QID   int
    DocID int
    Sim   float64
}

func (dict * Dictionary) CalScore(qID, docID int) float64 {
    query := dict.Queries[qID]
    document := dict.Documents[docID]
    score := calculation(query, document) // some counting and calculation
    // for example: count how many words in query are also in document and so on,
    // like tf-idf things.
    return score
}

// Calculate the similarity scores for each group.
func SimWorker(index int, dict *Dictionary, simsList *[][]Similarity, wg *sync.WaitGroup) {
    defer wg.Done()
    for i, sim := range (*simsList)[index] {
        // Retrieving words from Dictionary and compute, pretty time consuming.
        (*simsList)[index][i].Sim = dict.CalScore(dict.Queries[sim.QID], dict.Documents[sim.DocID])
    }
}

func main() {
    dict := Dictionary{
        // All data filled in.
    }
    simsList := [][]Similarity{
        // Slice of groups of structs containing
        // pairs of query id and doc id.
        // All sims scores are 0.0 initially.
    }

    var wg sync.WaitGroup
    for i := range simsList {
        wg.Add(1)
        go SimWorker(i, &dict, &simsList, &wg)
    }
    wg.Wait() // wait until all goroutines finish

    // Next procedures to the simsList
}

Basically I have a slice of groups of query-doc id pairs, query ids within each group are the same while doc ids are all different. The procedure is pretty straightforward, I just get the strings from the dictionary, then compute the score applying some algorithms. Firstly I did all these in sequence (not using goroutines) and it took several minutes to compute the scores for each group and several hours in total. Then I expected some speed improvements by introducing goroutines like above. I created a goroutine for each group since they access different parts in the Dictionary and [][]Similarity. But it turned out that the speed didn't improve and somewhat decreased a little bit (the number of goroutines I'm using is around 10). Why did this happen and how do I improve the program to actually speed up the computation?

Can you please add more info re ```dict.CalScore```. I'm mainly interested in whether it does anything that would block the other goroutines (e.g. locks a mutex or accesses an external resource that might limit concurrence). Adding some logging to ```dict.CalScore``` might help identify if it is blocking . — Brits, Feb 23 '20 at 00:11
@Brits In `dict.CalScore`, there is nothing special other than retrieving the actual string list from `dict` by ids and counting the words, do some mathematics. One more thing to mention, I've updated the code. I used `math` package to calculate `log`. — wangx1ng, Feb 23 '20 at 11:01
OK - so is the document to be searched in the map or are you passing a filename to ```calculation``` (which will then read the document from disk). Unfortunately without more info, ideally a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example), its difficult to say why this does not benefit from concurrency (and not all applications will). If you cannot provide an example then consider [profilling](https://blog.golang.org/profiling-go-programs) your application to determine which parts are taking a long time to run. — Brits, Feb 23 '20 at 19:02
@Brits The document is searched in the map. and the documents map was loaded into memory in advance in `main()`. After that point, no disk read. I will try to make a minimal example. — wangx1ng, Feb 23 '20 at 21:41
OK - worth checking CPU use when the app is running (both the original and the go routine version) to see if the CPU is fully utilised. Also check memory use because if you are loading in huge amounts of data some of this may get swapped out to disk (depending upon your setup). — Brits, Feb 23 '20 at 22:19
Assuming no obvious concurrency errors, you should use `pprof` to examine your program for bottlenecks and such. There are many good resources to learn about it. One of my favorite is [Dave Cheney's presentation](https://www.youtube.com/watch?v=nok0aYiGiYA) — Benny Jobigan, Feb 24 '20 at 12:37
How many CPUs does your machine have? if your program is CPU bound and you have 4 CPUs available, and set goroutines to 10, they will be constrained on the CPU, a good default for your go routine pool would be the # of CPUs available or Num CPUs - 1. It should be easy to benchmark: create a minimal subset of data that runs in ~1 minute. Start with a pool of 1, increase until performance no longer increases/ — dm03514, Feb 24 '20 at 14:35

score 0 · Answer 1 · answered Feb 24 '20 at 06:38

0

add this line to your entry point

 runtime.GOMAXPROCS(runtime.NumCPU())

it allows to use all core in your PC. With out this line, your programme will act concurrently, but not parallely

GoDoc

answered Feb 24 '20 at 06:38

Itsme

15
4

1

Do I really need to do this after go 1.5? the `runtime.GOMAXPROCS(0)` returns 4 in my system. – wangx1ng Feb 24 '20 at 12:32
Unnecessary unless you're running an ancient version of Go. "As of Go 1.5, the default value of GOMAXPROCS is the number of CPUs (whatever your operating system considers to be a CPU) visible to the program at startup." [source](https://dave.cheney.net/tag/gomaxprocs) – Benny Jobigan Feb 24 '20 at 12:33

Why can't I speed up my program by concurrently computing different parts of a slice?

1 Answers1