0

I am using a FreeBSD (on Citrix NetScaler)… I have the challenge of extracting the Mbps from a log that has literally 100's of thousands of lines.

The log look something like this, where the Mbps number with decimal can range from 0.0 to 9999.99 or more. I.e.

#>alphatext_anylength... (more_alphatext_in brackets)... Mbps (1.0)… alphatext_anylength... (more_alphatext_in brackets)... 
#>alphatext_anylength... (more_alphatext_in brackets)... Mbps (500.15)… alphatext_anylength... (more_alphatext_in brackets)... 
#>alphatext_anylength... (more_alphatext_in brackets)... Mbps (1500.01)… alphatext_anylength... (more_alphatext_in brackets)... 

Now the challenge is I want to filter out all the Mbps's bracketed number with decimals that is A) greater than 500mbps, with B) line numbers. I.e., for the above sample output, I want to see only the following:

#>[line number 20] 500.15
#>[line number 55] 1500.01

I have tried:

cat output.log | sed -n -e 's/^.*Mbps//p' |cut -c 3-10

Which gives me 10 characters after Mbps. But this is not smart enough to show only bracketed decimal number that is greater than 500Mbps.

I appreciate this might be a bit if a challenge... however would be grateful for any bash scripts wizards out there that can create magic!

Thanks in advance!

Rob
  • 14,746
  • 28
  • 47
  • 65
Newbie
  • 5
  • 5
  • Use `awk`, not `sed`. The line numbers and the 'greater than 500' parts say "not good for `sed`". Unless the line numbers are already present in the `#` bit...it's hard to guess sometimes what you mean when the data is faked so much. A couple of lines with semi-legitimate data would make it easier to see. The greater than condition still militates against using `sed`. It can be done with gruesome regexes, but it ain't nice. – Jonathan Leffler May 20 '20 at 02:02
  • Thanks Jonathon, appreciate the suggestions. – Newbie May 21 '20 at 16:33

5 Answers5

1
$ awk '{match($0,/Mbps \(([^)]*)\)/,a);if(a[1] > 500){print NR,a[1]} }' ./infile
2 500.15
3 1500.01
SiegeX
  • 135,741
  • 24
  • 144
  • 154
  • GNU `awk` is required in order to use the `match` function in this way. –  May 20 '20 at 12:49
  • @NazrulIslam Sure thing, FYI this 1-liner does **not** depend on the location of the parens `()` and will work even if the parens move or more are added as it keys off the occurrence of `Mbps` – SiegeX May 21 '20 at 18:03
  • Please excuse the delay – was waiting for another 500mbps data to feed into the logs, but now show. So have decided to test the scripts against 50mbps: – Newbie May 27 '20 at 23:39
  • As you can see from the simple grep filter output from log.199, there are tons of 50mbps lines results root@server# nsconmsg -K log.199 -s ConLb=1 -d oldconmsg | grep -i "Mbps(50" VIP(0.0.0.0:0:UP:LEASTCONNS): Hits(89266495, 63/sec) Mbps(50.62) Pers(SOURCEIP) Err(0) SO(0) LConn_Best [Idx:SubIdx] 57:0 PrimVserverDownBackupHits(4275) VIP(0.0.0.0:0:UP:LEASTCONNS): Hits(89280171, 36/sec) Mbps(50.82) Pers(SOURCEIP) Err(0) SO(0) LConn_Best [Idx:SubIdx] 57:0 PrimVserverDownBackupHits(4275) … ^C root@server# – Newbie May 27 '20 at 23:39
  • So adding your script to show anything greater than 50mbps to the NetScaler command parameter, root@server# nsconmsg -K log.199 -s ConLb=1 -d oldconmsg | awk '{match($0,/Mbps \(([^)]*)\)/,a);if(a[1] > 500){print NR,a[1]} }' awk: syntax error at source line 1 context is {match($0,/Mbps >>> \(([^)]*)\)/, <<< awk: illegal statement at source line – Newbie May 27 '20 at 23:40
  • Last comment shows the output when I ran your code, which resulted in errors - any suggestions please? – Newbie May 27 '20 at 23:51
  • @Newbie 1) you need GNU awk for my code, you can ensure you're using it by calling `gawk` 2) your comment above seems to have missed the backslash right after `Mbps ` as in `Mbps \((` – SiegeX May 27 '20 at 23:56
  • Thanks SiegeX, appreciate the reply.... ahhh, not sure how the backlash disappeared :-( during copy and pasting (don't recall changing your code) – Newbie May 28 '20 at 23:58
  • Anyway, tried again ensuring correct syntax's and chars... however was greeted with root@server# nsconmsg -K log.199 -s ConLb=1 -d oldconmsg | awk '{match($0,/Mbps \(([^)]*)\)/,a);if(a[1] > 50){print NR,a[1]} }' awk: syntax error at source line 1 context is {match($0,/Mbps >>> \(([^)]*)\)/, <<< awk: illegal statement at source line 1 – Newbie May 29 '20 at 00:00
  • @Newbie replace `awk` with `gawk` and try again – SiegeX May 29 '20 at 00:02
  • Guessing this a GAWK function... which i don't have access to on this system. So probably dead end for me using this code... nonetheless appreciate your time and effort... thank you! – Newbie May 29 '20 at 00:03
1

Using three rounds of sed, (tested with GNU sed, not sure if it works on BSD sed), and mainly shows why sed is not the easiest tool for this job:

sed '=;s/.*).*(\([0-9.]*\)).*(.*/ \1/' output.log | 
sed ':a;s/[0-9]*/#>[line number &]/;N;s/\n//g;n;ba' | 
sed -n '/\b\([5-9]\|[0-9]\{2,\}\)[0-9]\{2,\}[^]]/p'

Or on BSD sed, which doesn't understand \n, try (tentative attempt, since I'm not running BSD):

sed '=;s/.*).*(\([0-9.]*\)).*(.*/ \1/' output.log | 
sed ':a;s/[0-9]*/#>[line number &]/;N;s/
//g;n;ba' | 
sed -n '/\b\([5-9]\|[0-9]\{2,\}\)[0-9]\{2,\}[^]]/p'

Output:

#>[line number 2] 500.15
#>[line number 3] 1500.01

Notes: Why three rounds?

  1. The = outputs the current line number, but the output bypasses any of the line buffers, making the line number invisible within a single invocation of sed.

  2. That = also outputs an unwanted \n, and in sed that's inconvenient to get rid of. See How can I replace a newline (\n) using sed? which shows how the code works.

  3. sed only sees strings, it doesn't know about numbers and has no idea how to find number ranges by value. See Using sed to replace a number greater than a specified number at a specified position for how we can fake it.

agc
  • 7,973
  • 2
  • 29
  • 50
  • Thanks agc, looks magical! will try out and feedback! Also really appreciate the breakdown! – Newbie May 21 '20 at 16:35
  • Hi agc, Please excuse the delay – was waiting for another 500mbps data to feed into the logs, but now show. So have decided to test the scripts against 50mbps.. however I did not know how to amend your code to look for numbers greater than 50mbps... perhaps you can amend to this please? – Newbie May 27 '20 at 23:35
  • @Newbie, To change the *500mbps* limit to *50mbps*, use all three lines of the same code as above, but just change **one digit** on the last line. Look at the *very end* of the third line: `\{2,\}[^]]/p'`, change that **`2`** to a **`1`**, so it looks like `\{1,\}[^]]/p'`. – agc May 28 '20 at 05:25
  • Thanks agc, appreciate the reply and suggestion, ran the amended code (change in end of line 3): sed '=;s/.*).*(\([0-9.]*\)).*(.*/ \1/' output.log | sed ':a;s/[0-9]*/#>[line number &]/;N;s/\n//g;n;ba' | sed -n '/\b\([5-9]\|[0-9]\{2,\}\)[0-9]\{1,\}[^]]/p' – Newbie May 28 '20 at 23:51
  • It returned the following result: sed: 1: ":a;s/[0-9]*/#>[line num ...": unused label 'a;s/[0-9]*/#>[line number &]/;N;s/\n//g;n;ba' Any further ideas please? – Newbie May 28 '20 at 23:52
  • @Newbie, *GNU* `sed` and *BSD* `sed` have a few differences. See *[sed behaves different on FreeBSD and on Linux?](https://unix.stackexchange.com/questions/101059/sed-behaves-different-on-freebsd-and-on-linux)*. The laziest thing to do is install *GNU* `sed` from a ports repository, (`gsed` on *BSD*), which according to a [FreeBSD wiki about `sed`](http://freebsdwiki.net/index.php/Sed) is possible -- then just run the `gsed` code and not worry about the differences. – agc May 29 '20 at 11:14
  • Thanks agc, appreciate the info re sed variances and potential workaround. Unfortunately I am working on production systems, therefore installing new add-ons (simple as it might be) would not pass the risk approval system... however I appreciate your suggestion and will bear in mind for future ref... thanks again! – Newbie Jun 02 '20 at 17:19
1

With brackets as shown, you could use them as input field separators with awk:

awk -F '[()]' '($4+0) > 500 {print FNR, $4}' file

You may also want to check that $3 ends in Mbps:

awk -F '[()]' '($4+0) > 500 && $3~/Mbps *$/ {print FNR, $4}' file
  • Thanks user13582001, looks magical! will try out and feedback! – Newbie May 21 '20 at 16:36
  • Hi user13582001, No luck I am afraid. 1st code returned the characters within the first set of open/closed brackets (did not give me bracketed numbers in after the "Mbps" match. 2nd code returned nothing. – Newbie May 27 '20 at 23:32
0

You can use awk to match the lines containing Mbps ( followed by any non-) characters followed by ). Then replace the beginning of the string up to Mbps ( with an empty string and also ) up to the end with an empty string.

If the remaining line converted to a number (+0) is greater than 500, print the line number and the line.

awk '
  /Mbps \([^)]*\)/{ sub(/.*Mbps \(/, ""); sub(/\).*/, "") }
  ($0+0) > 500{ print FNR, $0 }
' file

Edit: To match lines containing an optional space after Mbps with a value > 50, use

awk '
  /Mbps ?\([^)]*\)/{ sub(/.*Mbps ?\(/, ""); sub(/\).*/, "") }
  ($0+0) > 50{ print FNR, $0 }
' file
Freddy
  • 4,548
  • 1
  • 7
  • 17
  • Please excuse the delay – was waiting for another 500mbps data to feed into the logs, but now show. So have decided to test the scripts against 50mbps: As you can see from the simple grep filter output from log.199, there are tons of 50mbps lines results – Newbie May 27 '20 at 23:41
  • root@server# nsconmsg -K log.199 -s ConLb=1 -d oldconmsg | grep -i "Mbps(50" VIP(0.0.0.0:0:UP:LEASTCONNS): Hits(89266495, 63/sec) Mbps(50.62) Pers(SOURCEIP) Err(0) SO(0) LConn_Best [Idx:SubIdx] 57:0 PrimVserverDownBackupHits(4275) VIP(0.0.0.0:0:UP:LEASTCONNS): Hits(89280171, 36/sec) Mbps(50.82) Pers(SOURCEIP) Err(0) SO(0) LConn_Best [Idx:SubIdx] 57:0 PrimVserverDownBackupHits(4275) … ^C root@server# – Newbie May 27 '20 at 23:41
  • root@server# nsconmsg -K log.199 -s ConLb=1 -d oldconmsg | awk '{/Mbps \([^)]*\)/{ sub(/.*Mbps \(/, ""); sub(/\).*/, "")} ($0+0) > 50{print FNR, $0}' awk: syntax error at source line 1 context is {/Mbps >>> \([^)]*\)/{ <<< awk: illegal statement at source line 1 missing } missing ) – Newbie May 27 '20 at 23:42
  • Last comment shows the output when I ran your code, which resulted in errors - any suggestions please? – Newbie May 27 '20 at 23:50
  • Your code looks different. My line doesn't start with an opening `{` and compare which parentheses are escaped. – Freddy May 28 '20 at 01:16
  • Appreciate the reply Freddy, indeed your code did not start with { so retried exactly as you written it using 50Mbps... whilst it ran without error, it returned empty handed (despite being certain log.99 file has lines with 50Mbps)... any other suggestions please? – Newbie May 28 '20 at 23:44
  • Argh, [your example](https://stackoverflow.com/questions/61903702/bash-script-to-extract-data-from-large-log-file/61903843?noredirect=1#comment109753035_61903843) doesn't have a space character after `Mbps` and before `(`. Try to remove the space character after `Mbps` in my command (twice) and again change 500 to 50. – Freddy May 29 '20 at 01:35
  • Thanks for the reply Freddy... I do not fully understand the last suggestions you made. Could you kindly type out your code in a single line, if possible please? – Newbie Jun 02 '20 at 17:14
  • ***Magician*** you are O:-) Thanks Freddy, that worked! Really appreciate your time, patience and amazing bash skills :-)) – Newbie Jun 03 '20 at 08:56
0

I improved the solution of @Freddy a bit

awk '/Mbps.\(.*\)/{sub(/.*Mbps \(/, ""); sub(/\).*/, "")} ($0+0) > 500{print $0}' output.log

please give him the ckeck :))

  • Hi Max, Whilst I did not receive any errors as I did for Freddy's, your code (amended to check for 50Mbps) returned no results... any ideas or suggestions please? root@server# nsconmsg -K log.199 -s ConLb=1 -d oldconmsg | awk '/Mbps.\(.*\)/{sub(/.*Mbps \(/, ""); sub(/\).*/, "")} ($0+0) > 50{print $0}' root@server# – Newbie May 27 '20 at 23:46