Using awk to detect // as an end of headers marker

Question

Currently I am parsing a script with the following command

ions and in the file it should only be from the first numeric (in this case five). The first file always start with the pattern [numeric].

Just a quick further task: is there maybe some additional way to grep the numerics inh all these numbers in there in the same go?

@mat that is a great answer is there maybe a way to create the foruth file as mentioned in the comment to you answer? — heinheo, May 04 '15 at 06:23

Mat · Answer 1 · 2015-05-04T07:00:07.763

1

You can do this with a very simple state machine - only two states: header or body.

curfile != FILENAME{body=0;curfile=FILENAME}
!body && /^\/\/$/    {body=1}
body  && /^\[/       {print > "first_"FILENAME}
body  && /^(seg|pos)/{print > "second_"FILENAME}
body  && /^[01]+/    {print > "third_"FILENAME}

This starts by setting body to zero/false whenever the filename changes (curfile will initially be unset), and switches that to one/true when the header separator is seen. The other rules only apply inside the body.

To extract the first bracketed number from the first group of lines, with this simple pattern you can just use the substr and index string manipulation functions. Something like the following should do:

body  && /^\[[0-9]+\]/ {
  print > "first_"FILENAME
  print substr($0, 2, index($0,"]")-2) > "fourth_"FILENAME
}

edited May 04 '15 at 07:00

answered May 04 '15 at 06:03

Mat

202,337
40
393
406

You don't even have to set `body` to zero, a variable name of a variable that isn't (yet) set will default to false if interpreted as boolean. – chw21 May 04 '15 at 06:20
@heinheo: you want the first `[number]` of each of those lines in a separate file, or all of the numbers between brackets? – Mat May 04 '15 at 06:24
@mat could you let me know how I could feed more than a single file into that framework at once...I know you use the filename option but where do I feed my list.txt in? In which list.txt containes the filenames fo all the files I want to parse? – heinheo May 04 '15 at 06:46
If `list.txt` contains the names of the files you want to parse, the command syntax you have in your question already does what you want - the `$( – Mat May 04 '15 at 06:50
Ah, sorry hadn't understood. See update, that should do the trick, @heinheo – Mat May 04 '15 at 07:00
@mat everything works excellent except for in the second file it shoudl start with the first number after pos: ...currently it starts with segsites then newline then pos and then only the number...can we fix that somhow? – heinheo May 04 '15 at 07:05
@heinheo: yes you can fix that quite easily. If you don't want segsite printed, don't match on it. If you don't want the pos printed, use a substring or one of the tricks [here](http://stackoverflow.com/questions/4198138/printing-everything-except-the-first-field-with-awk) – Mat May 04 '15 at 07:08
@mat does that mean I would just write what instead of /^(seg|pos)/ ...I tried /^(pos)/ awk '{first = $1; $1 = ""; print $0, first; }' but that did not work – heinheo May 04 '15 at 07:19
@heinheo: why are you trying to save the first item if you don't want it. `/^pos/{$1="";print > [...]`. – Mat May 04 '15 at 07:21
@mat could you maybe jsut potentially update if somebody comes accross as similar question? – heinheo May 04 '15 at 07:29
@heinheo: it is extremely unlikely that someone with the exact same requirement comes along. My answer gives the general technique(s). I can't list all awk functions and tools. If you need more processing on specific lines, search for the specific things you need for those lines. – Mat May 04 '15 at 07:31
@mat I added body && /^pos/{$1="";print > "second_"FILENAME} as a second line in the file but now it does not work any more...is the } behind FILENAME incorrect? – heinheo May 04 '15 at 07:40

Using awk to detect // as an end of headers marker

1 Answers1