how can i improve my sed command to extract data form ping log file?

Question

Follow the details, the site is asking me to include some text because there is mostly code, so i type this sentence, but i think it is self explanatory

Sample log file :

jue 08 abr 2021 13:33:49 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 40.462/50.166/62.318 ms
jue 08 abr 2021 13:35:35 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 99 packets received, 1% packet loss
round-trip min/avg/max = 42.055/48.856/136.962 ms
jue 08 abr 2021 13:37:21 -03 : VA  John Doe  : PING google.com (17x.2xx.1x2.4x): 56 data bytes

--- google.com ping statistics ---
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 40.058/47.762/64.169 ms

my command so far :

cat sample.log | sed -r -e '/^... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}/{s/(... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}).*$/\1/g;n;d}' -e '/^--- google.*$/d' -e 's/100 packets transmitted.*([0-9]+%) packet.*$/\1/' -e '/round-trip/d'

result obtained :

jue 08 abr 2021 13:33
0%
jue 08 abr 2021 13:35
1%
jue 08 abr 2021 13:37
0%

desired ideal result :

jue 08 abr 2021 13:33, 0%
jue 08 abr 2021 13:35, 1%
jue 08 abr 2021 13:37, 0%

RavinderSingh13 · Answer 1 · 2021-12-02T19:06:10.183

1st solution: This should be a task for awk. With your shown samples, please try following awk code.

awk -v OFS=", " '
match($0,/^[a-zA-Z]+ [0-9]{2} [a-zA-Z]+ [0-9]{4} ([0-9]{2}:){2}[0-9]{2}/){
  val=substr($0,RSTART,RLENGTH-3)
  next
}
/packets transmitted/{
  print val,$(NF-2)
  val=""
}
'  Input_file

Explanation: Simple explanation would be, using match function where mentioning regex to match ^[a-zA-Z]+ [0-9]{2} [a-zA-Z]+ [0-9]{4} ([0-9]{2}:){2}[0-9]{2}(explained regex in further), if a match of is found then creating val variable which has value of matched(caught value) by regex. Using next will skip all further statements from here. Then checking condition if line contains packets transmitted then print val along with 3rd last field of that line. Nullifying val variable then.

Explanation of regex:

^[a-zA-Z]+               ##Matching small/capital letters 1 or more occurrences from starting.
 [0-9]{2}                ##Matching space followed by 2 occurrences of digits.
 [a-zA-Z]+               ##Matching space followed by 2 occurrences of small/capital letters.
 [0-9]{4}                ##Matching space followed by followed by 4 digits.
 ([0-9]{2}:){2}[0-9]{2}  ##Matching space followed by digits 2 occurrences followed by colon and this whole group should occur 2 times followed by 2 occurrences of digits.

2nd solution: Using GNU awk here we can use almost same mentioned regex in RS variable and can get desired results as follows:

awk -v RS='[a-zA-Z]+ [0-9]{2} [a-zA-Z]+ [0-9]{4} [0-9]{2}:[0-9]{2}|[0-9]{1,3}%' -v OFS=", " '
RT{
  val=(val?val (++count%2==0?ORS:OFS):"") RT
}
END{
  print val
}
'  Input_file

thanks for the answer, i am not familiar with awk, but i just tried it and didn't get my desired result ... got a mix of x% only lines and other lines with date time aside x% seperated by space ... — mrossw, Dec 02 '21 at 19:04
@mrossw, with your shown samples, both of my answers/solutions worked fine for me. Is your actual Input_file same as shown samples? — RavinderSingh13, Dec 02 '21 at 19:04
@mrossw, Also make sure you don't have carriage returns in your file, try running command `cat -v your_file` once if you see control M characters in it then you need to remove them before running my codes. — RavinderSingh13, Dec 04 '21 at 03:54

score 2 · Accepted Answer · answered Dec 03 '21 at 15:52

2

Let's assume:

The packet loss percentage is always found in lines ending with NUM% packet loss.
The date and time are always found in lines ending with data bytes.

Then, with GNU sed (tested on the two complete records you show):

$ sed -nE '/packet loss$/{s/.*\s([0-9]+%) packet loss$/\1/;h}
  /data bytes$/{s/(.{24}).*/\1/;G;s/\n/, /;p}' sample.log
jue 08 abr 2021 13:35:35, 0%
jue 08 abr 2021 13:37:21, 1%

answered Dec 03 '21 at 15:52

Renaud Pacalet

25,260
3
34
51

thanks, i like the look of that. Reading man to figure out what h and G command do and what are pattern space and hold space ... – mrossw Dec 09 '21 at 20:31
[this answer](https://stackoverflow.com/questions/12833714/the-concept-of-hold-space-and-pattern-space-in-sed#12834372) is quite clarifying about pattern and hold spaces. Now I am wondering if the new line within the expression can be replaced by some other character @renaud-pacalet ? – mrossw Dec 09 '21 at 20:54
In the sed script I suggest the newline character is replaced by `, ` (comma-space). Replace it by anything else if it is not what you want. – Renaud Pacalet Dec 10 '21 at 05:21
i was referring to the newline before the 2nd 'address' (or whatever sed calls it), that is '/data bytes$/'. I induced it was to separate from the first one and realized that replacing it with a semicolon worked all the same. I needed it in one line to be able to copy/paste it easily. – mrossw Dec 13 '21 at 14:39
Oh! Sorry, I misunderstood your question. Yes, sed commands can be separated by newlines or by semicolons. – Renaud Pacalet Dec 13 '21 at 14:43

s.ouchene · Answer 3 · 2021-12-02T20:29:45.833

0

To get the desired format, you can pipe the output to:

sed 'N;s/\n/, /'

The final command becomes (note that you don't need to cat to sed as it accepts the filename as an argument):

sed -r -e '/^... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}/{s/(... [0-9]+ ... [0-9]{4} [0-9]{2}:[0-9]{2}).*$/\1/g;n;d}' -e '/^--- google.*$/d' -e 's/100 packets transmitted.*([0-9]+%) packet.*$/\1/' -e '/round-trip/d'  sample.log | sed 'N;s/\n/, /'

Output:

jue 08 abr 2021 13:33, 0%
jue 08 abr 2021 13:35, 1%
jue 08 abr 2021 13:37, 0%

edited Dec 02 '21 at 20:29

answered Dec 02 '21 at 20:23

s.ouchene

1,682
13
31

thanks, this gets me what I need. I was hoping some optimization for the expression. – mrossw Dec 09 '21 at 15:32

how can i improve my sed command to extract data form ping log file?

3 Answers3