How to determine number of lines between two strings using Bash and standard utilities?

Question

I have a file which contains data like this:

abc
abc, Iteration 1
abc
abc, Iteration 2
...
abc
abc, Iteration 19
abc
abc, Iteration 20

I would like to determine the number of lines between the lines which end exactly in the strings "Iteration 1" and "Iteration 2" and store the number of lines to the variable numlines. In the example above, numlines should contain the value 1.

I would like to use wc -l, sed, or awk.

check out http://stackoverflow.com/questions/38972736/how-to-select-lines-between-two-patterns for extracting the lines between the two patterns (excluding the patterns as well) and pass it on to `wc` or probably use a counter within `awk` solution itself — Sundeep, Oct 12 '16 at 02:09

score 4 · Accepted Answer · edited May 23 '17 at 10:34

Vijay's helpful sed answer is concise, but invariably processes the entire input file (and also creates extra child processes, because wc -l must be invoked as well - although that will hardly matter overall).

Try the following awk solution, which exits as soon as the end of the range is found (it also creates only a single child process - the subshell is optimized away in favor of the simple awk command); with large input files, this may matter, depending on where inside the file the range is positioned:

numlines=$(awk '/Iteration 1$/ {b=NR; next} /Iteration 2$/ {print NR-b-1; exit}' file)

^{Tip of the hat to karakfa for helping to optimize the command.}

^{Note: /Iteration 1$/ and /Iteration 2$/ are regular expressions that match strings Iteration 1 and Iteration 2 at the end of a line ($).

The strings at hand happen not to contain regular-expression metacharacters that need escaping (with \), but you may have to do so in other cases.

If the strings to match are not literals known in advance, generic escaping would be difficult; in that case, consider Ed Morton's solution, which is based on strings, not regular expressions.}

why not `/Iteration 1$/{b=NR} /Iteration 2$/{print NR-b-1; exit}` — karakfa, Oct 12 '16 at 03:03
@EdMorton: Yes, the regular expressions used in the command match lines that `end exactly in the strings "Iteration 1" and "Iteration 2"`, as requested. — mklement0, Oct 12 '16 at 03:37
@EdMorton: Yes. Between the footnote I've added and the pointer to your answer, I think we've got everything covered. — mklement0, Oct 12 '16 at 04:01

score 3 · Answer 2 · edited Oct 12 '16 at 02:38

3

sed '/Iteration\ 1/,/Iteration\ 2/!d;//d' filename  | wc -l

edited Oct 12 '16 at 02:38

mklement0

382,024
64
607
775

answered Oct 12 '16 at 02:20

Vijay

56
3

Much more concise than mine :) – tink Oct 12 '16 at 02:22
1

This seems to break when the file contains "Iteration 10" and "Iteration 20", etc. – IslandPatrol Oct 12 '16 at 02:26
@mklement0 I have updated the sample file in my question to reflect the nature of the data the solution should handle. – IslandPatrol Oct 12 '16 at 02:50
A clever and concise solution; please consider adding an explanation. The only down-side is that the _entire_ input file is invariably processed. – mklement0 Oct 12 '16 at 03:08
1

Will fail as @IslandPatrol mentioned and in other situations and why are you escaping spaces? Spaces are not regexp metacharacters. – Ed Morton Oct 12 '16 at 03:45

score 1 · Answer 3 · answered Oct 12 '16 at 03:44

1

All the solutions so far use regexps, not strings, and so will fail when your strings contain RE metacharacters. This is how to do what you want with strings as you asked for in your question:

$ awk '
BEGIN  {
    begStr = "Iteration 1"
    endStr = "Iteration 2"
}
index($0,begStr) == 1 + length($0) - length(begStr) { begNr = NR }
index($0,endStr) == 1 + length($0) - length(endStr) { print NR - begNr - 1 }
' file
1

answered Oct 12 '16 at 03:44

Ed Morton

188,023
17
78
185

While I don't think passing arbitrary strings to match was in the scope of the question (casual wording notwithstanding), ++ for a more generic solution. For optimization, consider appending `next` to the 1st action, and `exit` to the second. – mklement0 Oct 12 '16 at 03:48
Also, to better illustrate the advantage of your approach, I suggest showing how to pass the strings _via variables from the outside_: `awk -v begStr='Iteration 1' -v endStr='Iteration 2' '...'` – mklement0 Oct 12 '16 at 03:57
1

I didn't do that because then I'd have to explain/handle backslash expansion and idk if the OP needs a solution like that. I also don't want to waste time optimizing it since the OP probably wont use it anyway, just felt it was important for anyone coming across this question in future so see how to really use strings instead of regexps. – Ed Morton Oct 12 '16 at 04:00

How to determine number of lines between two strings using Bash and standard utilities?

3 Answers3

Linked