1

I have a log file with a standard format, e.g.:

31 Mar - Lorem Ipsom1
31 Mar - Lorem Ipsom2
31 Mar - Lorem Ipsom3

The replacement I want to implement is 31*31 to 31 so I'll end up with a log that has only its last line, in this example it will look like:

31 Mar - Lorem Ipsom3

I wish to perform it on a customized linux machine that has no perl. I tried to use sed like this:

sed -i -- 's/31*31/31/g' /var/log/prog/logFile

But it did nothing.. Any alternatives involving ninja bash commands are also welcomed.

GalB1t
  • 265
  • 1
  • 3
  • 13
  • 2
    not very sure what `31*31` means here. Could you clarify? Or you just want to print the last line containing 31? Does the file contain other lines? – fedorqui Mar 31 '15 at 11:17
  • @fedorqui: I think the OP confuses wildcards with quantifiers. – Willem Van Onsem Mar 31 '15 at 11:19
  • @CommuSoft yes but neither the glob `31*31` nor the regex `31.*31` match any lines. – terdon Mar 31 '15 at 11:20
  • @fedorqui: you can enable new line processing in `sed`. The it works. Although it will have side-effects the OP probably doesn't took into account. – Willem Van Onsem Mar 31 '15 at 11:20
  • @fedorqui I mean "*" as a wild card, sorry if wasn't specified. Got mixed up with "*" sign at more theoretical CS stuff I'm up with as CommuSoft said, – GalB1t Mar 31 '15 at 12:28
  • @GalB1t may be good to update your question with a proper explanation. You can do it by pressing the [Edit](http://stackoverflow.com/posts/29367276/edit) button. – fedorqui Mar 31 '15 at 13:12
  • 1
    Do **all** lines in the file start with the date? If yes, `tail -1 file` is all you need. – glenn jackman Mar 31 '15 at 13:31

3 Answers3

4

A way to keep only the last of consecutive lines that match a pattern is

sed -n '/^31/ { :a $!{ h; n; //ba; x; G } }; p' filename

This works as follows:

/^31/ {    # if a line begins with 31
  :a       # jump label for looping

  $!{      # if the end of input has not been reached (otherwise the current
           # line is the last line of the block by virtue of being the last
           # line)

    h      # hold the current line
    n      # fetch the next line. (note that this doesn't print the line
           # because of -n)

    //ba   # if that line also begins with 31, go to :a. // attempts the
           # most recently attempted regex again, which was ^31

    x      # swap hold buffer, pattern space
    G      # append hold buffer to pattern space. The PS now contains
           # the last line of the block followed by the first line that 
           # comes after it
  }
}
p          # in the end, print the result

This avoids some problems of mult-line regular expressions such as matches that begin or end in the middle of a line. It will also not discard lines between two blocks of matching lines and keep the last line of each block.

Wintermute
  • 42,983
  • 5
  • 77
  • 80
2

* is not a wildcard as it is in the shell, it is a quantifier. You need to quantify over . (any character). The regex is thus:

sed ':a;N;$!ba;s/31.*31/31/g'

(I removed the -i flag so you can first test your file safely).

The :a;N;$!ba; part makes it possible to process over new lines.

Note however:

  • The regex will match any 31 so:

    31 Mar - Lorem Ipsom1
    31 Mar - Lorem 31 Ipsom2
    

    Will result in

    31 Ipsom2
    
  • It will match greedy, if the log reads:

    31 Mar - Lorem Ipsom1
    30 Mar - Lorem Ipsom2
    31 Mar - Lorem Ipsom3
    

It remove the second line.

You can solve the first problem by writing:

sed ':a;N;$!ba;s/(^|\n)31.*\n31/31/g'

Which forces the regex that second 31 is located at the beginning of the line.

Community
  • 1
  • 1
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
0

I think you might be looking for "tail" to get the last line of the file e.g.

tail -1 /path/file

or if you want the last entry from each day then "sort" might be your solution

sort -ur -k 1,2 /path/file | sort
  • the -u flag specifies only a single match for the keyfields will be returned
  • the -k 1,2 specifies that the keyfields are the first two fields - in this case they are the month and the date - fields by default are separated by white space.
  • the -r flag reverses the lines such that the last match for each date will be returned. Sort a second time to restore the original order.

If your log file has more than a single month of data, and you wish to preserve order (e.g. if you have Mar 31 and Apr 1 in the same file) you can try:

cat -n tmp2 | sort -nr | sort -u -k 2,3 | sort -n | cut -f 2-
  • cat -n adds the line number to the log file before sorting.
  • sort as before but use fields 2 and 3, because field 1 is now the original line number
  • sort by the original line number to restore the original order.
  • use cut to remove the line numbers and restore the original line content.

e.g.

 $ cat tmp2
 30 Mar - Lorem Ipsom2
 30 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom2
 31 Mar - Lorem Ipsom3
 1 Apr - Lorem Ipsom1
 1 Apr - Lorem Ipsom2

 $ cat -n tmp2 | sort -r | sort -u -k 2,3 | sort | cut -f 2-
 30 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom3
 1 Apr - Lorem Ipsom2
Alan Dyke
  • 855
  • 6
  • 14