What's the fastest/most efficient way to count lines in Rebol?

Question

Given a string string, what is the fastest/most-efficient way to count lines therein? Will accept best answers for any flavour of Rebol. I've been working under the assumption that the parse [some [thru]] combination was the fastest way to traverse a string, but then I don't know that for certain, hence turning to SO:

count-lines: func [string [string!] /local count][
    parse/all string [
        (count: 1) some [thru newline (count: count + 1)]
    ]
    count
]

Or:

count-lines: func [string [string!] /local count][
    count: 0
    until [
        count: count + 1
        not string: find/tail string newline
    ]
    count
]

And how about counters? How efficient is repeat?

count-lines: func [string [string!]][
    repeat count length? string [
        unless string: find/tail string newline [
            break/return count
        ]
    ]
]

Update: line count goes by the Text Editor principle:

Empty TextMate Document

An empty document still has a line count of one. So:

>> count-lines ""
== 1
>> count-lines "^/"
== 2

Also, any assistance testing the veracity of speed/efficiency claims would be welcome. — rgchris, Feb 08 '13 at 05:08
Rebol programmers *would* think that an empty string has 1 line. What's with this fear of zero? :-P — HostileFork says dont trust SE, Feb 08 '13 at 22:37
Another thing in any performance consideration is the nature of the input. I would study any technique you try with varied input...some examples: *empty string*, *"long string of all newlines"*, *"long string of zero newlines"*... — HostileFork says dont trust SE, Feb 10 '13 at 00:02

score 4 · Answer 1 · answered Feb 13 '13 at 12:12

4

count-lines: func [
    str
    /local sort-str ][
sort-str: sort join str "^/"
1 + subtract index? find/last sort-str "^/" index? find sort-str "^/"
]

answered Feb 13 '13 at 12:12

Darius

41
1

Interesting approach—any thoughts (anyone) how this performs? – rgchris Feb 16 '13 at 18:29
Interesting, Darius! BTW, we have an [SO chat room for Rebol](http://chat.stackoverflow.com/rooms/291/rebol-and-red) if you'd like to join us... – HostileFork says dont trust SE Feb 16 '13 at 19:04
Any modifying approaches are likely going to be slow for long strings because of the overhead of shifting the series over during the modifications. But the profiler will tell us which approach will be better. – BrianH Feb 19 '13 at 23:26

rebolek · Answer 2 · 2013-02-20T18:10:12.133

4

Enhanced PARSE version, as suggested by BrianH:

i: 1 ; add one as TextMate
parse text [any [thru newline (++ i)]]
print i

edited Feb 20 '13 at 18:10

answered Feb 19 '13 at 22:44

rebolek

1,281
7
16

1

Should that be `++ i`? *(didn't notice there was a `++` :)* – rgchris Feb 19 '13 at 23:12
Could also start `i: 0` and say `[newline | end] (++ i)`. – rgchris Feb 19 '13 at 23:14
Is there a benefit/penalty using `skip` instead of `thru`? With skip, the parse rule is repeated as many times as the length of the string. – rgchris Feb 19 '13 at 23:15
The parse approach is likely to be fastest, so I upvoted it. Still, it could use a little work. You might consider using `to` or `thru`, for instance. – BrianH Feb 19 '13 at 23:20
Chris, initializing to 0 and then doing that block will not only be slower, but would result in an endless loop in R2: `any [end]` will never stop at the end of the string in R2. But it would be slower because `parse` has to recurse into the nested block, and then backtrack through the alternate (the part after the `|`). So, it's better to initialize to 1 instead. It's faster to use `thru` than `skip`, and makes the rule simpler too (exercise for rebolek, but as a hint: you don't need `end`, or to backtrack). – BrianH Feb 19 '13 at 23:38
Of course, d'oh! My inclination is to have the count fully contained within the rule, but inclinations aren't always rational. – rgchris Feb 19 '13 at 23:43
1

If you want the count contained, move the initialization into the rule, like this: `(i: 1)` – BrianH Feb 19 '13 at 23:44
Well, I used `skip` instead of `thru` because I wrote this reply just to gain enough points to join chat... But I like `parse` and `parse` answer was missing, so I wrote something :) – rebolek Feb 20 '13 at 14:38
1

If I may one-lineify it: `count-lines: func [text [string!] /i][parse text [(i: 1) any [thru newline (++ i)]] i]` – rgchris Feb 20 '13 at 18:54
@rgchris You mean `/local i` I take it. :-) – HostileFork says dont trust SE Feb 21 '13 at 14:25
1

@HostileFork `/i` also creates a local. `/local` is a convention that has no formal meaning. Though I would use `/local` in a formal setting. – rgchris Sep 27 '13 at 16:49

score 3 · Answer 3 · edited Feb 16 '14 at 03:27

3

Here's the best simple non-parse version I can think of:

count-lines: function [text [string!]] [
    i: 1
    find-all text newline [++ i]
    i
]

It uses function and ++ from more recent versions of Rebol, and find-all from either R3 or R2/Forward. You could look at the source of find-all and inline what you find and optimize, but situations like this are exactly what we wrote find-all for, so why not use it?

edited Feb 16 '14 at 03:27

HostileFork says dont trust SE

32,904
11
98
167

answered Feb 20 '13 at 18:37

BrianH

2,186
16
14

Rebolek's `parse` solution is likely faster, but only a profiler can say for sure. – BrianH Feb 20 '13 at 18:38
Another interesting new function, thks! – rgchris Feb 20 '13 at 18:49
The `find-all` function is a good candidate to change to a native. – BrianH Feb 22 '13 at 01:22

score 2 · Answer 4 · answered Feb 08 '13 at 08:49

2

Here is the best for me:

temp: read/lines %mytext.txt
length? temp

answered Feb 08 '13 at 08:49

MaxV

588
1
4
13

1

The string source is not a file, do you mean to write the string to a file in order to read/lines? – rgchris Feb 08 '13 at 17:15
1

OK, then we'll have to add the overhead of writing out the value to the file, then reading it back. Or you can try `deline/lines`, which does the same thing in memory without needing a file. Watch out though: In R2, `deline/lines` is mezzanine, so it won't be as fast as some of the other solutions. In R3 it's native. Regardless, `read/lines` and `deline/lines` make a copy of the source string, so we'll have to see if that overhead overwhelms the advantage of being a single native call. I can confirm this approach works though. – BrianH Feb 20 '13 at 18:50
BriaH is a real programmer, I just propose solution. If you need time optimization or other hight technical solution, contact him. :-) – MaxV Feb 25 '13 at 11:49

score 2 · Answer 5 · answered Feb 18 '13 at 08:57

2

remove-each can be fast as it is native

s: "1^/2^/3"
a: length? s
print a - length? remove-each v s [v = #"^/"]
; >> 2

or as a function

>> f: func [s] [print [(length? s) - (length? remove-each v s [v = #"^/"])]]
>> f "1^/2^/3"
== 2

answered Feb 18 '13 at 08:57

endo64

2,269
2
27
34

I'd need to benchmark this against the `'parse` function above to see which is faster. I'd also worry that `'s` would need to be copied so as not to modify the original string thereby hampering efficiency. – rgchris Feb 18 '13 at 22:14
Style Notes: function returns the number of newlines, not the number of lines—should return `s + 1` per the 'Text Editor Principle' (see question). – rgchris Feb 18 '13 at 22:17
Note that in Rebol 3, REMOVE-EACH returns the removal count, not the modified series. So your suggestion can just be `remove-each v s [v = #"^/"]` in Rebol 3. (And that is actually the first use case I ever came across where the rather weird return value of REMOVE-EACH in R3 is useful.) – earl Sep 27 '13 at 18:12

score 2 · Answer 6 · answered Feb 21 '13 at 12:15

2

Why no one came with the simplest solution I wonder :)

t: "abc^/de^/f^/ghi"
i: 0 until [i: i + 1 not t: find/tail t newline] i
== 4

Not sure about the performance but I think it's quite fast, as UNTIL and FIND are natives. WHILE could be used as well.

i: 1 while [t: find/tail t newline] [i: i + 1] i
== 4

Just need to check for empty string. And if it would be a function, argument series needs to be HEADed.

answered Feb 21 '13 at 12:15

endo64

2,269
2
27
34

The `while` solution here is what I meant by my "inline `find-all` and optimize" comment. The `until` solution might be faster, since `until` is a simpler native. `i: i + 1` is faster than `++ i` in R2, but slower in R3. – BrianH Feb 22 '13 at 01:19

score 2 · Answer 7 · answered Sep 26 '13 at 15:04

2

Not the most efficient, but probably one of the fastest solution (anyway if a benchmark is run, I would like to see how this solution performs):

>> s: "1^/2^/ ^/^/3"
>> (length? s) - length? trim/with copy s newline
== 4

answered Sep 26 '13 at 15:04

DocKimbel

3,127
16
27

score 1 · Answer 8 · answered Feb 11 '13 at 01:39

1

Do not know about performance, and the last line rule (r3).

>> length? parse "1^/2^/3" "^/"
== 3

answered Feb 11 '13 at 01:39

dt2

183
6

My concern here is that it creates as many new strings as there is lines—it's a very concise one-liner, but I fear it does more than the `parse` example in the question. Would like to hear other opinions... – rgchris Feb 11 '13 at 19:37
The other downside to this approach is a quirk in using `parse` as `split`—it breaks under this condition: `parse {one^/"two^/three"} "^/"` – rgchris Feb 11 '13 at 19:40
Good points. i doubt the extra memory would be a concern. i cheat and do not benchmark the extra gc :) But the quote-handling is a bad surprise. – dt2 Feb 22 '13 at 20:09

shadwolf · Answer 9 · 2013-05-16T18:36:14.120

hehehe the read/lines length? temp is a great thing I though about read/lines -> foreach lines temps [ count: count + 1]

another way to do it would be to do

temp: "line 1 ^M line2 ^M  line3 ^M "
length? parse temp newline ; that cuts the strings into a block 
;of multiple strings that represent each a line [ "line 1" "line2" "line3" ] 
:then you count how much  strings you have in the block with length?

I like to code in rebol it is so funny

Edit I didnt read the whole post so my solution already waas proposed in a different way...

ok to amend for my sin of posting a already posted solution I will bring insight comment of a unexpected behavior of that solution. Multiple chained carriage returns are not counted (using rebol3 linux ...)

>> a: "line1 ^M line2 ^M line3 ^M^M"
== "line1 ^M line2 ^M line3 ^M^M"

>> length? parse a newline 
== 3

What's the fastest/most efficient way to count lines in Rebol?

9 Answers9

Linked