You can use bufio.Scanner
with bufio.ScanWords
, tokenize on whitespace boundaries, and compare non-whitespace sequences to your delimiter:
scanner := bufio.NewScanner(reader)
scanner.Split(bufio.ScanWords) // you can implement your own split function
// but ScanWords will suffice for your example
for scanner.Scan() {
// scanner.Bytes() efficiently exposes the file contents
// as slices of a larger buffer
if bytes.HasPrefix(scanner.Bytes(), []byte("START")) {
... // keep scanning until the end delimiter
}
// copying unmodified inputs is quite simple:
_, err := writer.Write( scanner.Bytes() )
if err != nil {
return err
}
}
This will ensure that the amount of data read in from the file remains bounded (this is controlled by MaxScanTokenSize
)
Note that if you want to use multiple goroutines, you'll need to copy the data first, since scanner.Bytes()
returns a slice that is only valid until the next call to .Scan()
, but if you choose to do that then I wouldn't bother with a scanner.
For what it's worth, a 3MB size file is actually not such a bad idea to load on a general purpose computer nowadays, I would only think twice if it was an order of magnitude bigger. It would almost certainly be faster to use bytes.Split
with your delimiters.