I want to do the same thing in Go as asked in here.
I'm parsing a huge log file, and I need to parse it line by line. On each line I deserialize the line into a struct. The data may come from any data source (file, network etc). So, I receive an io.Reader
in my function. Since the file is huge, I want to split it among many goroutines.
I could have done this easily using io.Pipe
etc. However, I need to split the file without cutting the lines, for example, in half without cutting them in the middle. So that, each goroutine may receive an io.Reader
and then they may work in different parts of the file.
Sometimes, I also need to send io.MultiReader
to my function as well. In that case, I would do the same again. So, it's not necessarily the same file (but mostly it is).
func scan(r io.Reader, pf ProcessFunc) {
// need to split `r` here if `r` is:
// r.(io.ReadSeeker)
// run goroutine #1 with 50% of the stream
// uses bufio.Scanner
// run goroutine #2 with 50% of the stream
// uses bufio.Scanner
// another goroutine is receiving the deserialized values
// and sends them to the ProcessFunc for processing further
// down the pipeline
}
Let's say the data is like this:
foo1 bar2
foo3 bar4
foo5 bar6
foo7 bar8
The goroutine #1 will get an io.Reader like so:
foo1 bar2
foo3 bar4
And the goroutine #2 will get an io.Reader like so:
foo5 bar6
foo7 bar8
But not like this:
o5 bar6 -> breaks the line in the second io.Reader
foo7 bar8