1

I have a file which contains data like this:

abc
abc, Iteration 1
abc
abc, Iteration 2
...
abc
abc, Iteration 19
abc
abc, Iteration 20

I would like to determine the number of lines between the lines which end exactly in the strings "Iteration 1" and "Iteration 2" and store the number of lines to the variable numlines. In the example above, numlines should contain the value 1.

I would like to use wc -l, sed, or awk.

IslandPatrol
  • 261
  • 3
  • 11
  • 2
    check out http://stackoverflow.com/questions/38972736/how-to-select-lines-between-two-patterns for extracting the lines between the two patterns (excluding the patterns as well) and pass it on to `wc` or probably use a counter within `awk` solution itself – Sundeep Oct 12 '16 at 02:09

3 Answers3

4

Vijay's helpful sed answer is concise, but invariably processes the entire input file (and also creates extra child processes, because wc -l must be invoked as well - although that will hardly matter overall).

Try the following awk solution, which exits as soon as the end of the range is found (it also creates only a single child process - the subshell is optimized away in favor of the simple awk command); with large input files, this may matter, depending on where inside the file the range is positioned:

numlines=$(awk '/Iteration 1$/ {b=NR; next} /Iteration 2$/ {print NR-b-1; exit}' file)

Tip of the hat to karakfa for helping to optimize the command.

Note: /Iteration 1$/ and /Iteration 2$/ are regular expressions that match strings Iteration 1 and Iteration 2 at the end of a line ($).
The strings at hand happen not to contain regular-expression metacharacters that need escaping (with \), but you may have to do so in other cases.
If the strings to match are not literals known in advance, generic escaping would be difficult; in that case, consider Ed Morton's solution, which is based on strings, not regular expressions.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 2
    why not `/Iteration 1$/{b=NR} /Iteration 2$/{print NR-b-1; exit}` – karakfa Oct 12 '16 at 03:03
  • @EdMorton: Yes, the regular expressions used in the command match lines that `end exactly in the strings "Iteration 1" and "Iteration 2"`, as requested. – mklement0 Oct 12 '16 at 03:37
  • 1
    @EdMorton: Yes. Between the footnote I've added and the pointer to your answer, I think we've got everything covered. – mklement0 Oct 12 '16 at 04:01
3
sed '/Iteration\ 1/,/Iteration\ 2/!d;//d' filename  | wc -l
mklement0
  • 382,024
  • 64
  • 607
  • 775
Vijay
  • 56
  • 3
1

All the solutions so far use regexps, not strings, and so will fail when your strings contain RE metacharacters. This is how to do what you want with strings as you asked for in your question:

$ awk '
BEGIN  {
    begStr = "Iteration 1"
    endStr = "Iteration 2"
}
index($0,begStr) == 1 + length($0) - length(begStr) { begNr = NR }
index($0,endStr) == 1 + length($0) - length(endStr) { print NR - begNr - 1 }
' file
1
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • While I don't think passing arbitrary strings to match was in the scope of the question (casual wording notwithstanding), ++ for a more generic solution. For optimization, consider appending `next` to the 1st action, and `exit` to the second. – mklement0 Oct 12 '16 at 03:48
  • Also, to better illustrate the advantage of your approach, I suggest showing how to pass the strings _via variables from the outside_: `awk -v begStr='Iteration 1' -v endStr='Iteration 2' '...'` – mklement0 Oct 12 '16 at 03:57
  • 1
    I didn't do that because then I'd have to explain/handle backslash expansion and idk if the OP needs a solution like that. I also don't want to waste time optimizing it since the OP probably wont use it anyway, just felt it was important for anyone coming across this question in future so see how to really use strings instead of regexps. – Ed Morton Oct 12 '16 at 04:00