0

I'll preface this with the fact that I have no knowledge of awk (or maybe it's sed I need?) and fairly basic knowledge of grep and Linux, so apologies if this is a really dumb question. I find the man pages really difficult to decipher, and googling has gotten me quite far in my solution but not far enough to tie the two things I need to do together. Onto the problem...

I have some log files that I'm trying to extract rows from that are on a Linux server, named in the format aYYYYMMDD.log, that are all along the lines of:

Starting Process A
Wed 27 Oct 18:15:39 BST 2021 >>> /dir/task1 start <<<
...
Wed 27 Oct 18:15:40 BST 2021 >>> /dir/task1 end <<<
Wed 27 Oct 18:15:40 BST 2021 >>> /dir/task2 start <<<
...
Wed 27 Oct 18:15:42 BST 2021 >>> /dir/task2 end <<<
...
...
Wed 27 Oct 18:15:53 BST 2021 >>> /dir/taskreporting start <<<
...
Wed 27 Oct 18:15:53 BST 2021 >>> Starting task90 <<<
...
Wed 27 Oct 18:15:54 BST 2021 >>> Finishing task90 <<<
Wed 27 Oct 18:15:54 BST 2021 >>> Starting task91 <<<
...
Wed 27 Oct 18:15:57 BST 2021 >>> Finishing task91 <<<
...
...
Wed 27 Oct 18:16:12 BST 2021 >>> Starting task99 <<<
...
Wed 27 Oct 18:16:27 BST 2021 >>> Finishing task99 <<<
...
Wed 27 Oct 18:16:27 BST 2021 >>> /dir/taskreporting end <<<
...
Ended Process A

(I've excluded the log rows which are irrelevant to my requirement; )

I need to find what tasks were run during the taskreporting task, which I have managed to do with the following command (thanks to this other stackoverflow post):

awk '/taskreporting start/{flag=1;next}/taskreporting end/{flag=0}flag' <specific filename>.log | grep 'Starting task\|Finishing task'

This works well when I run it against a single file and produces output like:

Wed 27 Oct 18:15:53 BST 2021 >>> Starting task90 <<<
Wed 27 Oct 18:15:54 BST 2021 >>> Finishing task90 <<<
Wed 27 Oct 18:15:54 BST 2021 >>> Starting task91 <<<
Wed 27 Oct 18:15:57 BST 2021 >>> Finishing task91 <<<
...
Wed 27 Oct 18:16:12 BST 2021 >>> Starting task99 <<<
Wed 27 Oct 18:16:27 BST 2021 >>> Finishing task99 <<<

which is pretty much what I want to see. However, as I have multiple files to extract (having amended the filename in the above command appropriately, e.g. to *.log), I need to output the filename alongside the rows, so that I know which file the info belongs to, e.g. I'd like to see:

a211027.log Wed 27 Oct 18:15:53 BST 2021 >>> Starting task90 <<<
a211027.log Wed 27 Oct 18:15:54 BST 2021 >>> Finishing task90 <<<
a211027.log Wed 27 Oct 18:15:54 BST 2021 >>> Starting task91 <<<
a211027.log Wed 27 Oct 18:15:57 BST 2021 >>> Finishing task91 <<<
...
a211027.log Wed 27 Oct 18:16:12 BST 2021 >>> Starting task99 <<<
a211027.log Wed 27 Oct 18:16:27 BST 2021 >>> Finishing task99 <<<

I've googled and it seems like {print FILENAME} is what I need, but I couldn't figure out where to add it into my current awk command. How can I amend my awk command to get it to add the filename to the beginning of the rows? Or is there a better way of achieving my aim?

Boneist
  • 22,910
  • 1
  • 25
  • 40
  • 3
    `awk '/taskreporting start/{flag=1;next}/taskreporting end/{flag=0}flag {print FILENAME, $0}' .log` – HatLess Oct 28 '21 at 13:09
  • @HatLess ah, that's perfect - thank you so much! If you make that an actual answer, I'll accept it – Boneist Oct 28 '21 at 13:22

1 Answers1

1

As you have provided most of the answer yourself, all that is needed is {print FILENAME, $0} which will add the filename in front of the rest of the content $0

awk '/taskreporting start/{flag=1;next}/taskreporting end/{flag=0}flag  {print FILENAME, $0}'  <specific filename>.log
HatLess
  • 10,622
  • 5
  • 14
  • 32
  • 1
    fantastic - thanks! I did try adding the `{print FILENAME}` in at the start and in the middle of the pattern but it just stopped producing any output at all; I didn't think to try adding it at the end! And the `$0` had cropped up in another answer that I'd seen somewhere, and I thought would probably be relevant to my answer but where and how? And now I know, thanks to you :) – Boneist Oct 28 '21 at 13:37