Here's an interesting situation I ran into. I need to read from a file, and populate a map based on what we found, after some data manipulation using go-routines. Here's the simplified problem statement and example:
Generate the data required by running gen_data.sh
#!/bin/bash
rm some.dat || :
for i in `seq 1 10000`; do
echo "$i `date` tx: $RANDOM rx:$RANDOM" >> some.dat
done
If I read those lines in some.dat
into a map[int]string
without go-routines using loadtoDict.go
, it retains alignment. (as in the 1st and 2nd words are the same, see o/p below.)
In real-life I do need to process the lines (expensive) before they are loaded into the map, using go-routines speeds up my dictionary creation, and this is an important requirement for the real problem.
loadtoDict.go
package main
import (
"bufio"
"fmt"
"log"
"os"
)
var (
fileName = "some.dat"
)
func checkerr(err error) {
if err != nil {
fmt.Println(err)
log.Fatal(err)
}
}
func main() {
ourDict := make(map[int]string)
f, err := os.Open(fileName)
checkerr(err)
defer f.Close()
fscanner := bufio.NewScanner(f)
indexPos := 1
for fscanner.Scan() {
text := fscanner.Text()
//fmt.Println("text", text)
ourDict[indexPos] = text
indexPos++
}
for i, v := range ourDict {
fmt.Printf("%d: %s\n", i, v)
}
}
Running:
$ ./loadtoDict
...
8676: 8676 Mon Dec 23 15:52:24 PST 2019 tx: 17718 rx:1133
2234: 2234 Mon Dec 23 15:52:20 PST 2019 tx: 13170 rx:15962
3436: 3436 Mon Dec 23 15:52:21 PST 2019 tx: 17519 rx:5419
6177: 6177 Mon Dec 23 15:52:23 PST 2019 tx: 5731 rx:5449
notice how the 1st and 2nd words are "aligning". However, if I use go-routines to load my map, this goes awry:
async_loadtoDict.go
package main
import (
"bufio"
"fmt"
"log"
"os"
"sync"
)
var (
fileName = "some.dat"
mu = &sync.RWMutex{}
MAX = 9000
)
func checkerr(err error) {
if err != nil {
fmt.Println(err)
log.Fatal(err)
}
}
func main() {
ourDict := make(map[int]string)
f, err := os.Open(fileName)
checkerr(err)
defer f.Close()
fscanner := bufio.NewScanner(f)
indexPos := 1
var wg sync.WaitGroup
sem := make(chan int, MAX)
defer close(sem)
for fscanner.Scan() {
text := fscanner.Text()
wg.Add(1)
sem <- 1
go func() {
mu.Lock()
defer mu.Unlock()
ourDict[indexPos] = text
indexPos++
<- sem
wg.Done()
}()
}
wg.Wait()
for i, v := range ourDict {
fmt.Printf("%d: %s\n", i, v)
}
}
output:
$ ./async_loadtoDict
...
11: 22 Mon Dec 23 15:52:19 PST 2019 tx: 25688 rx:7602
5716: 6294 Mon Dec 23 15:52:23 PST 2019 tx: 28488 rx:3572
6133: 4303 Mon Dec 23 15:52:21 PST 2019 tx: 24286 rx:1565
7878: 9069 Mon Dec 23 15:52:25 PST 2019 tx: 16863 rx:24234
8398: 7308 Mon Dec 23 15:52:23 PST 2019 tx: 4321 rx:20642
9566: 3489 Mon Dec 23 15:52:21 PST 2019 tx: 14447 rx:12630
2085: 2372 Mon Dec 23 15:52:20 PST 2019 tx: 14375 rx:24151
This is despite guarding the ingestion ourDict[indexPos]
with mutex. I'd like my map index align with the ingestion attempt.
Thanks!