Note: I optimized and improved the below solution, and released it as a library here: github.com/icza/backscanner
bufio.Scanner
uses an io.Reader
as its source, which does not support seeking and / or reading from arbitrary positions, so it is not capable of scanning lines from the end. bufio.Scanner
can only read any part of the input once all data preceding it has already been read (that is, it can only read the end of the file if it reads all the file's content first).
So we need a custom solution to implement such functionality. Fortunately os.File
does support reading from arbitrary positions as it implements both io.Seeker
and io.ReaderAt
(any of them would be sufficient to do what we need).
Scanner that returns lines going backward, starting at the end
Let's construct a Scanner
which scans lines backward, starting with the last line. For this, we'll utilize an io.ReaderAt
. The following implementation uses an internal buffer into which data is read by chunks, starting from the end of the input. The size of the input must also be passed (which is basically the position where we want to start reading from, which must not necessarily be the end position).
type Scanner struct {
r io.ReaderAt
pos int
err error
buf []byte
}
func NewScanner(r io.ReaderAt, pos int) *Scanner {
return &Scanner{r: r, pos: pos}
}
func (s *Scanner) readMore() {
if s.pos == 0 {
s.err = io.EOF
return
}
size := 1024
if size > s.pos {
size = s.pos
}
s.pos -= size
buf2 := make([]byte, size, size+len(s.buf))
// ReadAt attempts to read full buff!
_, s.err = s.r.ReadAt(buf2, int64(s.pos))
if s.err == nil {
s.buf = append(buf2, s.buf...)
}
}
func (s *Scanner) Line() (line string, start int, err error) {
if s.err != nil {
return "", 0, s.err
}
for {
lineStart := bytes.LastIndexByte(s.buf, '\n')
if lineStart >= 0 {
// We have a complete line:
var line string
line, s.buf = string(dropCR(s.buf[lineStart+1:])), s.buf[:lineStart]
return line, s.pos + lineStart + 1, nil
}
// Need more data:
s.readMore()
if s.err != nil {
if s.err == io.EOF {
if len(s.buf) > 0 {
return string(dropCR(s.buf)), 0, nil
}
}
return "", 0, s.err
}
}
}
// dropCR drops a terminal \r from the data.
func dropCR(data []byte) []byte {
if len(data) > 0 && data[len(data)-1] == '\r' {
return data[0 : len(data)-1]
}
return data
}
Example using it:
func main() {
scanner := NewScanner(strings.NewReader(src), len(src))
for {
line, pos, err := scanner.Line()
if err != nil {
fmt.Println("Error:", err)
break
}
fmt.Printf("Line start: %2d, line: %s\n", pos, line)
}
}
const src = `Start
Line1
Line2
Line3
End`
Output (try it on the Go Playground):
Line start: 24, line: End
Line start: 18, line: Line3
Line start: 12, line: Line2
Line start: 6, line: Line1
Line start: 0, line: Start
Error: EOF
Notes:
- The above
Scanner
does not limit max length of lines, it handles all.
- The above
Scanner
handles both \n
and \r\n
line endings (ensured by the dropCR()
function).
- You may pass any starter position not just the size / length, and listing lines will be performed from there (continuation).
- The above
Scanner
does not reuse buffers, always creates new ones when needed. It would be enough to (pre)allocate 2 buffers, and use those wisely. Implementation would become more complex, and it would introduce a max line length limit.
Using it with a file
To use this Scanner
with a file, you may use os.Open()
to open a file. Note that *File
implements io.ReaderAt()
. Then you may use File.Stat()
to obtain info about the file (os.FileInfo
), including its size (length):
f, err := os.Open("a.txt")
if err != nil {
panic(err)
}
fi, err := f.Stat()
if err != nil {
panic(err)
}
defer f.Close()
scanner := NewScanner(f, int(fi.Size()))
Looking for a substring in a line
If you're looking for a substring in a line, then simply use the above Scanner
which returns the starting pos of each line, reading lines from the end.
You may check the substring in each line using strings.Index()
, which returns the substring position inside the line, and if found, add the line start position to this.
Let's say we're looking for the "ine2"
substring (which is part of the "Line2"
line). Here's how you can do that:
scanner := NewScanner(strings.NewReader(src), len(src))
what := "ine2"
for {
line, pos, err := scanner.Line()
if err != nil {
fmt.Println("Error:", err)
break
}
fmt.Printf("Line start: %2d, line: %s\n", pos, line)
if i := strings.Index(line, what); i >= 0 {
fmt.Printf("Found %q at line position: %d, global position: %d\n",
what, i, pos+i)
break
}
}
Output (try it on the Go Playground):
Line start: 24, line: End
Line start: 18, line: Line3
Line start: 12, line: Line2
Found "ine2" at line position: 1, global position: 13