1

The following grep command gives me the number of requests from July 1st to July 31st between 8 a.m. and 4 p.m.

zgrep -E "[01\-31]/Jul/2021:[08\-16]" localhost_access.log* | wc -l

I don't want to get all requests in the month, but the requests per day. I could of course enter the command 31 times, but that's tedious. Is there a way to display the requests per day one below the other, so that I get the following as a result (ideally sorted by number), for example

543

432

321

etc.

How to do that?

steffen
  • 16,138
  • 4
  • 42
  • 81
ng-User
  • 243
  • 1
  • 6
  • 18
  • 1
    [MCVE](https://stackoverflow.com/help/minimal-reproducible-example) – Thor Aug 02 '21 at 14:50
  • 1
    Your regex doesn’t actually work - see [What is the difference between square brackets and parentheses in a regex?](https://stackoverflow.com/q/9801630/256196) – Bohemian Aug 02 '21 at 14:53
  • The first one is irrelevant, the second range is `(0[8-9])|(1[0-6])`. – steffen Aug 02 '21 at 15:02
  • @Bohemian The only answer is wrong and yet accepted and we can't add other answers. It's totally clear what the author wants so it doesn't need any debugging. Closing this question was not helpful at all :-( – steffen Aug 04 '21 at 11:15
  • @bryan Your regex is wrong and it's reading the files 31 times (instead of once). – steffen Aug 04 '21 at 11:12
  • @steffen L yes, the answer is wrong, but because it didn’t actually answer the question (which asked for a regex, not an alternative solution), I’ve converted it to a comment, which leaves the site in a “most helpful” state. The question however lacks sample input data. Perhaps you can address that in an answer? – Bohemian Aug 04 '21 at 13:19
  • I mean, to be fair - they seemed confident their code worked, so I just added a loop around it. "I don't want to get all requests in the month, but the requests per day." - I'm not sure how my answer didn't help with that, but oh well. –  Aug 04 '21 at 18:35
  • And there was a comment on my answer "The regex is wrong and it's reading the files 31 times (instead of once)." - does it say somewhere in the question that it needs to only read the file once? If the regex is wrong, okay - that's fair - I should have checked it, I suppose, before copying and pasting blindly from the question. –  Aug 04 '21 at 18:36
  • @BryanHeden I was merely complaining about closing this question. I had written my answer and simply couldn't post it. Next day, yours was the accepted one. Look, the regex was wrong, yes, and the loop would be reading files 30 times more than needed - ok. But then there's also this wildcard in the command line... And zgrep indicates gzipped files. In total, that's a lot! – steffen Aug 04 '21 at 19:31

1 Answers1

1

You want to count lines based on a certain value in a line. That's a good job for awk. With grep-only, you would always have to process the input files once per day. In any way, we need to fix your regex first:

zgrep -E "[01\-31]/Jul/2021:[08\-16]" localhost_access.log* | wc -l

[08\-16] matches the characters 0, 8, -, 1 and 6. What you want to match is (0[89])|(1[0-6]); that is 0, followed by one of 8 or 9 - or - 1 followed by one of range 0-6. To make it easier, we assume normal days in the date and therefore match the day with [0-9]{2} (two digits).

Here's a complete awk for your task:

awk -F/ '/[0-9]{2}\/Jul\/2021:(0[89])|(1[0-6])/{a[$1]++}END{for (i in a) print "day " i ": " a[i]}' localhost_access.log*

Explanation:

  • /[0-9]{2}\/Jul\/2021:(0[89])|(1[0-6])/ matches date + time for every day (at 08-16) in july
  • {a[$1]++} builds an array with key=day and a counter of occurrences.
  • END{for (i in a) print "day " i ": " a[i]} prints the array when all input files were processed

Because we've set the field separator to /, you need to change a[$1] to address the correct position (for two more slashes before the actual date: a[$3]). (Of course this can be solved in a more dynamic way.)

Example:

$ cat localhost_access.log
01/Jul/2021:08 log message
01/Jul/2021:08 log message
02/Jul/2021:08 log message
02/Jul/2021:07 log message
$ awk -F/ '/[0-9]{2}\/Jul\/2021:(0[89])|(1[0-6])/{a[$1]++}END{for (i in a) print "day " i ": " a[i]}' localhost_access.log*
day 01: 2
day 02: 1

Run zcat | awk in case your log files are compressed, but remember the regex above searches for "Jul/2021" only.

steffen
  • 16,138
  • 4
  • 42
  • 81