-2

Building an app that displays a text editor it would be nice to communicate to the user the cursor position as line and offset. This is an example method doing this.

/// Find row and column of cursor position
/// Checks indexStarts for the index of the start larger than the selected position and
/// calculates the distance between cursor position and the previous line start.
func setColumnAndRow() {
    // Convert NSRange to Range<String.Index>    
    let selectionRange = Range(selectedRange, in: string)!

    // Retrieve the first start line greater than the cursor position
    if let nextIndex = indexStarts.firstIndex(
                                where: {$0 > selectionRange.lowerBound}
    ) {
        // The line with the cursor was one before that
        let lineIndex = nextIndex - 1
        // Use the <String.Index>.distance to determine the column position
        let distance = string.distance(from: indexStarts[lineIndex]
                                       , to: selectionRange.lowerBound
                           )
        print("column: \(distance), row: \(lineIndex)")
    } else {
        print("column: 0, row: \(indexStarts.count-1)")
    }
}

According to my research Apple does not offer any API for this purpose, in fact this is not even a feature of the Xcode editor. I ended up that I need to build up an array of the character position for each line start as used above. This array must be updated every time anything changes in the NSTextField. Therefore the generation of this list must be very effective and fast.

I found/assembled four methods to generate the line start array:

1st method

Uses number of glyphs and lineFragmentRect - This is by far the slowest implementation

func lineStartsWithLayout() -> [Int] {
        // about 100 times slower than below

        let start = ProcessInfo.processInfo.systemUptime

        var lineStarts:[Int] = []
        let layoutManager = layoutManager!
        let numberOfGlyphs = layoutManager.numberOfGlyphs
        var lineRange: NSRange = NSRange()
        
        var indexOfGlyph: Int = 0
        lineStarts.append(indexOfGlyph)
        while indexOfGlyph < numberOfGlyphs {
            layoutManager.lineFragmentRect(
                                forGlyphAt: indexOfGlyph
                              , effectiveRange: &lineRange
                              , withoutAdditionalLayout: false
                          )
            indexOfGlyph = NSMaxRange(lineRange)
            lineStarts.append(indexOfGlyph)
        }
        lineStarts.append(Int.max)
        Logger.write("\(ProcessInfo.processInfo.systemUptime-start) s")
        return lineStarts
}

2nd method

Uses the paragraphs array for the individual line length - According to Apple may be not recommended as it might produce plenty of objects. Here this very likely is not the case as we are just reading the paragraph array and we don't apply any modification to it. In effect nearly as fast as the fastest implementation. Therefore my recommendation if you use Objective-C.

func lineStartsWithParagraphs() -> [Int] {
    // about 100 times faster than above
    let start = ProcessInfo.processInfo.systemUptime;

    var lineStarts:[Int] = []
    var lineStart = 0
    lineStarts = []
    lineStarts.append(lineStart)
    for p in textStorage?.paragraphs ?? [] {
        lineStart += p.length
        lineStarts.append(lineStart)
    }
    lineStarts.append(Int.max)
    Logger.write("\(ProcessInfo.processInfo.systemUptime-start) s")
    return lineStarts
}

3rd method

Uses enumerateLines - Expected to be very fast, but in effect nearly twice as slow than lineStartsWithParagraphs, but quite Swifty.

func lineStartsByEnumerating() -> [Int] {
    let start = ProcessInfo.processInfo.systemUptime;
    var lineStarts:[Int] = []
    var lineStart = 0
    lineStarts = []
    lineStarts.append(lineStart)
    string.enumerateLines {
        line, stop in
        lineStart += line.count
        lineStarts.append(lineStart)
    }
    lineStarts.append(Int.max)
    Logger.write("\(ProcessInfo.processInfo.systemUptime-start) s")
    return lineStarts
}

4th method

Uses lineRange from Swift - Fastest and probably best implementation for Swift. Can't be used in Objective-C. Little bit more complicated to use as for example NSTextView.selectedRange returns an NSRange and therefore must be converted to Range<String.Index>.

func indexStartsByLineRange() -> [String.Index] {
    /*
     // Convert Range<String.Index> to NSRange:
     let range   = s[s.startIndex..<s.endIndex]
     let nsRange = NSRange(range, in: s)
     
     // Convert NSRange to Range<String.Index>:
     let nsRange = NSMakeRange(0, 4)
     let range   = Range(nsRange, in: s)
     */
    let start = ProcessInfo.processInfo.systemUptime;
    var indexStarts:[String.Index] = []
    var index = string.startIndex
    indexStarts.append(index)
    while index != string.endIndex {
        let range = string.lineRange(for: index..<index)
        index = range.upperBound
        indexStarts.append(index)
    }
    Logger.write("\(ProcessInfo.processInfo.systemUptime-start) s")
    return indexStarts
}

Benchmark:

Method Time for NSTextView with 32000 lines on M2 with Ventura
1. lineStartsWithLayout 1.452 s
2. lineStartsWithParagraphs 0.020 s
3. lineStartsByEnumerating 0.065 s
4. indexStartsByLineRange 0.019 s

I would prefer indexStartsByLineRange, but I am interested to hear other opinons In Objective-C I would stick to the algo in lineStartsWithParagraphs, taking into account some calls must be adapted.

Willeke
  • 14,578
  • 4
  • 19
  • 47
  • Iterating over all content to find all line boundaries is kind of a non-starter for any large text field. Fundamentally, `String` is the wrong tool for the job. For large text fields, you need something a [Rope](https://en.wikipedia.org/wiki/Rope_(data_structure)), which can do a better job at local modifications/updates without needing to rescan the whole thing often – Alexander May 30 '23 at 14:26
  • How about doing a correction when something changes? For example: remove 4 characters in line 200 -> subtract 4 from line starts 201+. – Willeke May 31 '23 at 08:37
  • 1
    Is this question about `NSTextView` or `NSTextField`? Or is it about finding lines in a string? See [Get line and column number from absolute position and vice versa](https://stackoverflow.com/questions/47207611/ios-string-get-line-and-column-number-from-absolute-position-and-vice-versa) – Willeke May 31 '23 at 08:42
  • @Alexander/@Willeke Thanks for the answers, but it was not the idea to build a full fledged texteditor. 20ms to scan 32000 lines of text might be fast enough to do it even for every letter typed, but your improvements are welcome. The 2nd proposal from Willeke was not known to me and I will try it. Sad is, that Apple SDK does not provide any API for this. – Lego Esprit Jun 02 '23 at 07:56

2 Answers2

0

Triggered by Willeke's comment I checked different possibilities for line number and cursor position calculation and the result was suprising me.

func lineNumberRegularExpression() -> (Int, Int) {
    let start = ProcessInfo.processInfo.systemUptime;
    let selectionRange: NSRange = selectedRange()
    let regex = try! NSRegularExpression(pattern: "\n", options: [])
    let lineNumber = regex.numberOfMatches(in: string, options: [], range: NSMakeRange(0, selectionRange.location)) + 1
    var column = 0
    if let stringIndexSelection = Range(selectionRange, in: string) {
        let lineRange = string.lineRange(for: stringIndexSelection)
        column = string.distance(from: lineRange.lowerBound, to: stringIndexSelection.upperBound)
    }
    print("Using RegEx     :\(ProcessInfo.processInfo.systemUptime-start) s")
    return (lineNumber, column)
}

func lineNumberScanner() -> (Int, Int) {
    let start = ProcessInfo.processInfo.systemUptime;
    let selectionRange: NSRange = selectedRange()
    let stringIndexSelection = Range(selectionRange, in: string)!
    let startOfString = string[..<stringIndexSelection.upperBound]
    let scanner = Scanner(string: String(startOfString))
    scanner.charactersToBeSkipped = nil
    var lineNumber = 0
    while (nil != scanner.scanUpToCharacters(from: CharacterSet.newlines) && !scanner.isAtEnd) {
        lineNumber += 1
        scanner.currentIndex = scanner.string.index(after: scanner.currentIndex)
    }
    let lineRange = string.lineRange(for: stringIndexSelection)
    let column = string.distance(from: lineRange.lowerBound, to: stringIndexSelection.upperBound)
    print("Using scanner   :\(ProcessInfo.processInfo.systemUptime-start) s")
    return (lineNumber, column)
}

func lineNumberComponents() -> (Int, Int) {
    let start = ProcessInfo.processInfo.systemUptime;
    let stringIndexSelection = Range(selectedRange(), in: string)!
    let startOfString = string[..<stringIndexSelection.upperBound]
    var lineNumber = startOfString.components(separatedBy: "\n").count
    let lineRange = string.lineRange(for: stringIndexSelection)
    let column = string.distance(from: lineRange.lowerBound, to: stringIndexSelection.upperBound)
    print("Using components:\(ProcessInfo.processInfo.systemUptime-start) s")
    return (lineNumber, column)
}

func lineNumberEnumerate() -> (Int, Int) {
    let start = ProcessInfo.processInfo.systemUptime;

    let stringIndexSelection = Range(selectedRange(), in: string)!
    let startOfString = string[..<stringIndexSelection.upperBound]
    var lineNumber = 0
    startOfString.enumerateLines { (startOfString, _) in
        lineNumber += 1
    }

    let lineRange = string.lineRange(for: stringIndexSelection)
    let column = string.distance(from: lineRange.lowerBound, to: stringIndexSelection.upperBound)
    if 0 == column {
        lineNumber += 1
    }
    print("Using enumerate :\(ProcessInfo.processInfo.systemUptime-start) s")
    return (lineNumber, column)

}

func lineNumberReduce() -> (Int, Int) {
    let start = ProcessInfo.processInfo.systemUptime;

    let stringIndexSelection = Range(selectedRange(), in: string)!
    let startOfString = string[string.startIndex..<stringIndexSelection.upperBound]
    let lineNumber = startOfString.reduce(into: 1) { (counts, letter) in
        if "\n" == letter {
            counts += 1
        }
    }

    let lineRange = string.lineRange(for: stringIndexSelection)
    let column = string.distance(from: lineRange.lowerBound, to: stringIndexSelection.upperBound)
    print("Using reduce    :\(ProcessInfo.processInfo.systemUptime-start) s")
    return (lineNumber, column)

}

Please note the tiny differences in the methods, but this was the only way to get identical results, except that the method with reduce sometimes yields too small line numbers for some texts. Oddly enough, using RegularExpression was the fastest. It was always under 10ms for a text with 32000 lines.

Be aware not to use .newline for the "\n" as then you will count twice the number of lines for "\a" and "\n".

Method Benchmark Comment
RegEx 0.006 s
scanner 0.036 s
components 0.038 s
enumerate 0.028 s
reduce 0.132 s failure

So for me the answer is, that using the regular expression this is so fast, that keeping a line start array might not be required.

-1

Since tabulated values were not so easy to interpret I wrote a complete Swift application to do the benchmarking. App benchmarking line counters

  1. All was intended to be used with NSTextView, where the maximum number of lines I was able to proceed was anyway limitted to 400000 (about 28MB of text) and NSTextView was behaving slugish. Therefore I limitted the size to 300000 lines.
  2. The solution using the Regular Expression only requires 30ms (M2 Ventura) and 120ms (Intel 2.6GHz High Sierra) for these 300000 lines of text.
  3. Basically the algorithms should work with multibyte texts, but I did not test this.

Therefore very likely I will accept my own answer.

  • This isn't an answer to your question. You should delete this "answer' and add it to your other, actual answer. – HangarRash Jun 08 '23 at 15:41
  • Why this is not an answer? The App even offers and analyses several possibilities. I was asking for a fast and efficient way, and the answer of Willeke triggered my search to find probably the fastest way. Could you please elaborate, why you think it is not an answer? – Lego Esprit Jun 09 '23 at 19:54
  • You had already posted an answer. Then you posted this answer. I thought this answer was just commentary that should have been part of either the original question or your first answer. Why post two answers to your own question? – HangarRash Jun 09 '23 at 20:13
  • Basically you are right, but 1. the listed methods are visible only in the 1st answer, but not for the 2nd answer (you need GitHub). 2. the 2nd answer is more complete, because 2 more methods are listed and it answers Alexander's doubts about the effectiveness for non-binary search. – Lego Esprit Jun 10 '23 at 11:20