Sed regex string substitution from terminal

Question

I have a log file with a standard format, e.g.:

31 Mar - Lorem Ipsom1
31 Mar - Lorem Ipsom2
31 Mar - Lorem Ipsom3

The replacement I want to implement is 31*31 to 31 so I'll end up with a log that has only its last line, in this example it will look like:

31 Mar - Lorem Ipsom3

I wish to perform it on a customized linux machine that has no perl. I tried to use sed like this:

sed -i -- 's/31*31/31/g' /var/log/prog/logFile

But it did nothing.. Any alternatives involving ninja bash commands are also welcomed.

not very sure what `31*31` means here. Could you clarify? Or you just want to print the last line containing 31? Does the file contain other lines? — fedorqui, Mar 31 '15 at 11:17
@fedorqui: I think the OP confuses wildcards with quantifiers. — Willem Van Onsem, Mar 31 '15 at 11:19
@CommuSoft yes but neither the glob `31*31` nor the regex `31.*31` match any lines. — terdon, Mar 31 '15 at 11:20
@fedorqui: you can enable new line processing in `sed`. The it works. Although it will have side-effects the OP probably doesn't took into account. — Willem Van Onsem, Mar 31 '15 at 11:20
@fedorqui I mean "*" as a wild card, sorry if wasn't specified. Got mixed up with "*" sign at more theoretical CS stuff I'm up with as CommuSoft said, — GalB1t, Mar 31 '15 at 12:28
@GalB1t may be good to update your question with a proper explanation. You can do it by pressing the [Edit](http://stackoverflow.com/posts/29367276/edit) button. — fedorqui, Mar 31 '15 at 13:12
Do **all** lines in the file start with the date? If yes, `tail -1 file` is all you need. — glenn jackman, Mar 31 '15 at 13:31

score 4 · Accepted Answer · answered Mar 31 '15 at 11:29

A way to keep only the last of consecutive lines that match a pattern is

sed -n '/^31/ { :a $!{ h; n; //ba; x; G } }; p' filename

This works as follows:

/^31/ {    # if a line begins with 31
  :a       # jump label for looping

  $!{      # if the end of input has not been reached (otherwise the current
           # line is the last line of the block by virtue of being the last
           # line)

    h      # hold the current line
    n      # fetch the next line. (note that this doesn't print the line
           # because of -n)

    //ba   # if that line also begins with 31, go to :a. // attempts the
           # most recently attempted regex again, which was ^31

    x      # swap hold buffer, pattern space
    G      # append hold buffer to pattern space. The PS now contains
           # the last line of the block followed by the first line that 
           # comes after it
  }
}
p          # in the end, print the result

This avoids some problems of mult-line regular expressions such as matches that begin or end in the middle of a line. It will also not discard lines between two blocks of matching lines and keep the last line of each block.

score 2 · Answer 2 · edited May 23 '17 at 12:28

2

* is not a wildcard as it is in the shell, it is a quantifier. You need to quantify over . (any character). The regex is thus:

sed ':a;N;$!ba;s/31.*31/31/g'

(I removed the -i flag so you can first test your file safely).

The :a;N;$!ba; part makes it possible to process over new lines.

Note however:

The regex will match any 31 so:

31 Mar - Lorem Ipsom1
31 Mar - Lorem 31 Ipsom2

Will result in

31 Ipsom2

It will match greedy, if the log reads:

31 Mar - Lorem Ipsom1
30 Mar - Lorem Ipsom2
31 Mar - Lorem Ipsom3

It remove the second line.

You can solve the first problem by writing:

sed ':a;N;$!ba;s/(^|\n)31.*\n31/31/g'

Which forces the regex that second 31 is located at the beginning of the line.

edited May 23 '17 at 12:28

Community

1
1

answered Mar 31 '15 at 11:18

Willem Van Onsem

443,496
30
428
555

seems it works while there is no `31` in text (and no 31 at the start of the line/ealier line) like `3 Mar - Lorem Ipsom311` – NeronLeVelu Mar 31 '15 at 12:57
@NeronLeVelu: the first or the second? – Willem Van Onsem Mar 31 '15 at 13:05
1

with you correction it's better but what about internal non 31 starting line ? (OP was not clear about this, sample showing only 31 line so it's ok for me) – NeronLeVelu Mar 31 '15 at 13:37

Alan Dyke · Answer 3 · 2015-03-31T20:51:12.133

I think you might be looking for "tail" to get the last line of the file e.g.

tail -1 /path/file

or if you want the last entry from each day then "sort" might be your solution

sort -ur -k 1,2 /path/file | sort

the -u flag specifies only a single match for the keyfields will be returned
the -k 1,2 specifies that the keyfields are the first two fields - in this case they are the month and the date - fields by default are separated by white space.
the -r flag reverses the lines such that the last match for each date will be returned. Sort a second time to restore the original order.

If your log file has more than a single month of data, and you wish to preserve order (e.g. if you have Mar 31 and Apr 1 in the same file) you can try:

cat -n tmp2 | sort -nr | sort -u -k 2,3 | sort -n | cut -f 2-

cat -n adds the line number to the log file before sorting.
sort as before but use fields 2 and 3, because field 1 is now the original line number
sort by the original line number to restore the original order.
use cut to remove the line numbers and restore the original line content.

e.g.

 $ cat tmp2
 30 Mar - Lorem Ipsom2
 30 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom2
 31 Mar - Lorem Ipsom3
 1 Apr - Lorem Ipsom1
 1 Apr - Lorem Ipsom2

 $ cat -n tmp2 | sort -r | sort -u -k 2,3 | sort | cut -f 2-
 30 Mar - Lorem Ipsom1
 31 Mar - Lorem Ipsom3
 1 Apr - Lorem Ipsom2

Sed regex string substitution from terminal

3 Answers3