1

I have some log outputs that looks like this:

<DEBUG> 01-Jan-1970::01:01:01.012 JavaClassName Lala-Worker-Thread-0: - This line contains information
<DEBUG> 01-Jan-1970::01:01:01.012 JavaClassName Lala-Worker-Thread-0: - Result received: 'com.lala.lulu.SomeClass@13579ace'

I'd like to prune this output to

<DEBUG> - This line contains information
<DEBUG> - Result received: 'com.lala.lulu.SomeClass@13579ace'

I've found something that sort of works but sometimes does stuff that I don't want, so I want to "quality proof" it a bit:

$ sed -r 's/(<[A-Z]+>)(.*:)/\1/g' theLogFile
<DEBUG> - This line contains information
<DEBUG> 'com.lala.lulu.SomeClass@13579ace'

Who knows what else my regex is capable of (in a bad way), so what's a better solution? This works better but seems kind of dumb, specifying the dash in a group in a way that "removes it", and then add it back manually:

sed -r 's/(<[A-Z]+>)(.*: - )(.*)/\1 - \3/g'

Anyway, advice, anyone? Thank you.

Erik Vesterlund
  • 481
  • 6
  • 19
  • 1
    I'd probably do something like `perl -pe 's/^<[A-Z]+> \K\S+ \S+ \S+: //'`. – melpomene Dec 10 '17 at 14:47
  • 1
    You may need to have a clearer idea of what your pattern is. From the limited example of yours, I'd guess all debug lines provide the information you want after the ": - ". If so, a simple ```cat theLogFile | sed 's/.*: - //g'``` would do the job. – MASL Dec 10 '17 at 14:49
  • 1
    @MASL: you should loose the habit to pipe `cat` to sed/grep/awk/... all these commands have a "filename" parameter. – Casimir et Hippolyte Dec 10 '17 at 15:27
  • @CasimiretHippolyte you should learn when those details are irrelevant. I live much happier since I did. – MASL Dec 10 '17 at 17:02
  • 1
    @MASL: This one is never irrelevant since many people like you copy and repeat it. – Casimir et Hippolyte Dec 10 '17 at 17:39
  • @CasimiretHippolyte You are not paying attention: it's you who is repeating a comment thinking it brings anything to the discussion. It is perfectly ok and even elegant to use such pipe *if you know when and where*. – MASL Dec 10 '17 at 18:22
  • @MASL: if you say it.... – Casimir et Hippolyte Dec 10 '17 at 18:56
  • For intermediate-level shell users (newbies and advanced level may safely ignore it): see these comments on the usage of ```cat```, http://porkmail.org/era/unix/award.html – MASL Dec 10 '17 at 23:09
  • @MASL you can of course use `cat file | command` if you like but you should understand and expect that other people WILL point out that that is redundant and not be offended when they do given that programming that way has it's very own abbreviation ([UUOC](https://www.acronymfinder.com/Useless-Use-of-Cat-(UUOC).html)) and web sites dedicated to it e.g. http://porkmail.org/era/unix/award.html. – Ed Morton Dec 10 '17 at 23:10
  • @EdMorton Thanks for you comment. I understand it and that's why I included myself that link. That said, my previous comment *is what/how people should point this out*: (1) It is arrogant to personalize a comment -say, assume to know what habit others may need to loose or not- and (2) people here throw this comment around **without** thinking why it might be useless and/or wrong, and **when that is completely irrelevant**. – MASL Dec 10 '17 at 23:17
  • To anybody: for an excellent discussion in stackoverflow on the uses of cat see https://stackoverflow.com/questions/11710552/useless-use-of-cat?noredirect=1&lq=1 – MASL Dec 11 '17 at 00:04
  • The good and bad thing about SO is that anyone can post anything, whether they really understand the subject matter or not. The only really useful/concrete info on that post are the answer by JonathanLeffner and the associated comments by CharlesDuffy, both of whom have an excellent understanding of shell. A missing point on that thread is that many commands (e.g. grep and awk) can use the file name for producing output, making decisions, etc. when called as `command file` but obviously are deprived of that when called as `cat file | command`. – Ed Morton Dec 11 '17 at 04:02

2 Answers2

2

Simple sed approach:

sed -E 's/^(<[A-Z]+>).*:( - .*)/\1\2/' logfile

The output:

<DEBUG> - This line contains information
<DEBUG> - Result received: 'com.lala.lulu.SomeClass@13579ace'
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • Thanks! Nice to see I wasn't way off, haha... One thing I don't get is, why does it stop at the "right" colon (i.e. the last one), and not after the year? When the engine reads ` 01-Jan-1970:` , it has found the pattern `.*:`, why doesn't it stop there? – Erik Vesterlund Dec 10 '17 at 15:31
  • 1
    @ErikVesterlund, welcome, *why doesn't it stop there?* - greedy `*` operator matches at most characters until encountering the last `:` char – RomanPerekhrest Dec 10 '17 at 15:40
0
$ sed -E 's/ ([^:]*:){5}//' file
<DEBUG> - This line contains information
<DEBUG> - Result received: 'com.lala.lulu.SomeClass@13579ace'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185