1

Currently I have a file with the format as follows:

####<Oct 23, 2015 10:23:34 AM PDT> <ERROR> <com.foo.biz.jrules.ilog.RulesEngine> <BELC02NF206G3QN> <tcbiz2_1> <siteType=DOMESTIC> <catalina-exec-16> <sessionId=432407E73A6BFE1C4AFE8205ED386907> <clientIp=127.0.0.1> <com.foo.biz.jrules.ilog.RulesEngine.mapPricedSearch(?:?):priceRuleDesc=SNSDTA:PRO-18.612782:NOBTA>
####<Oct 23, 2015 10:23:34 AM PDT> <ERROR> <com.foo.biz.jrules.ilog.RulesEngine> <BELC02NF206G3QN> <tcbiz2_1> <siteType=DOMESTIC> <catalina-exec-16> <sessionId=432407E73A6BFE1C4AFE8205ED386907> <clientIp=127.0.0.1> <com.foo.biz.jrules.ilog.RulesEngine.mapPricedSearch(?:?):priceRuleDesc=SNSDTA:PRO-15.806297:NOBTA>
####<Oct 23, 2015 10:23:34 AM PDT> <ERROR> <com.foo.biz.jrules.ilog.RulesEngine> <BELC02NF206G3QN> <tcbiz2_1> <siteType=DOMESTIC> <catalina-exec-16> <sessionId=432407E73A6BFE1C4AFE8205ED386907> <clientIp=127.0.0.1> <com.foo.biz.jrules.ilog.RulesEngine.mapPricedSearch(?:?):priceRuleDesc=SNSDTA:PRO-4.2497005:NOBTA>

I'm trying to strip out everything after the priceRuleDesc= term and before the last > character. Currently, I'm trying to test out a regex in sed on my Mac to accomplish this, but without much luck.

The command I'm using is:

cat ~/myapp/logs/tcbiz2_1.log | grep -i priceRuleDesc | sed -E 's/^.*priceRuleDesc=/foo/'

Surprisingly in my sed command, the ^.*priceRuleDesc= doesn't doesn't match to substitute everything on the line up until then with foo. I suspect that the ^.* is just walking to the end of the line without being smart enough to stop when priceRuleDesc occurs. I found another question somewhat similar to this one called Non greedy regex matching in Sed, but I'm not convinced that what's going on in that question is what's going on here, and I would also like to know if there is a Sed solution for this. Also, I'm sure that this must be a duplicate of some other question here that I'm just not finding. So if anybody could point me to the right question that would be great, or supply an answer that would be great. Thanks.

Community
  • 1
  • 1
entpnerd
  • 10,049
  • 8
  • 47
  • 68
  • You don't need `grep` here your `sed` will only modify lines that match already. (If you did need the filtering you could do that in `sed` directly too. `sed -E '/priceRuleDesc/s/..../.../'` – Etan Reisner Oct 23 '15 at 18:36
  • Correct, but I need grep to filter out lines that don't have priceRuleDesc on them. I neglected to add that to the sample file. – entpnerd Oct 23 '15 at 18:51
  • Nope. Just use `sed -nE '/priceRuleDesc=/{ s/..../..../;p }'` formatted however your `sed` needs that brace-block. – Etan Reisner Oct 23 '15 at 19:00

4 Answers4

2

You can just use negation based regex:

sed 's/^.*priceRuleDesc=\|>$//g' file
SNSDTA:PRO-18.612782:NOBTA
SNSDTA:PRO-15.806297:NOBTA
SNSDTA:PRO-4.2497005:NOBTA

Or using awk:

awk -F 'priceRuleDesc=|>$' '{print $2}' file
SNSDTA:PRO-18.612782:NOBTA
SNSDTA:PRO-15.806297:NOBTA
SNSDTA:PRO-4.2497005:NOBTA
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • The problem with that though is that I don't want to substitute anything after the `priceRuleDesc` term, only what comes before it. – entpnerd Oct 23 '15 at 18:34
  • But you wrote: **I'm trying to strip out everything after the priceRuleDesc= term and before the last > character.** What is your expected output for the shown input? – anubhava Oct 23 '15 at 18:36
  • Totally confusing language. My bad. I meant that I wanted to retain only the data that was after priceRuleDesc. Sorry for the confusion. – entpnerd Oct 23 '15 at 18:44
  • So you want `SNSDTA:PRO-18.612782:NOBTA>` in first line as output? – anubhava Oct 23 '15 at 19:09
  • Without the trailing `>` yes. – entpnerd Oct 23 '15 at 21:28
  • 1
    Thanks anubhava, your answer definitely would have worked had there not been highlighted characters that I didn't take into account. See my answer below for details. Thanks for your help - it is appreciated, and was definitely useful in troubleshooting to find out what was going on. – entpnerd Oct 24 '15 at 19:04
1

Works fine for me:

mike ~ $ cat foo.txt
####<Oct 23, 2015 10:23:34 AM PDT> <ERROR> <com.foo.biz.jrules.ilog.RulesEngine> <BELC02NF206G3QN> <tcbiz2_1> <siteType=DOMESTIC> <catalina-exec-16> <sessionId=432407E73A6BFE1C4AFE8205ED386907> <clientIp=127.0.0.1> <com.foo.biz.jrules.ilog.RulesEngine.mapPricedSearch(?:?):priceRuleDesc=SNSDTA:PRO-18.612782:NOBTA>
####<Oct 23, 2015 10:23:34 AM PDT> <ERROR> <com.foo.biz.jrules.ilog.RulesEngine> <BELC02NF206G3QN> <tcbiz2_1> <siteType=DOMESTIC> <catalina-exec-16> <sessionId=432407E73A6BFE1C4AFE8205ED386907> <clientIp=127.0.0.1> <com.foo.biz.jrules.ilog.RulesEngine.mapPricedSearch(?:?):priceRuleDesc=SNSDTA:PRO-15.806297:NOBTA>
####<Oct 23, 2015 10:23:34 AM PDT> <ERROR> <com.foo.biz.jrules.ilog.RulesEngine> <BELC02NF206G3QN> <tcbiz2_1> <siteType=DOMESTIC> <catalina-exec-16> <sessionId=432407E73A6BFE1C4AFE8205ED386907> <clientIp=127.0.0.1> <com.foo.biz.jrules.ilog.RulesEngine.mapPricedSearch(?:?):priceRuleDesc=SNSDTA:PRO-4.2497005:NOBTA>
mike ~ $ sed -E 's/^.*priceRuleDesc=/foo/' foo.txt 
fooSNSDTA:PRO-18.612782:NOBTA>
fooSNSDTA:PRO-15.806297:NOBTA>
fooSNSDTA:PRO-4.2497005:NOBTA>
mike ~ $ 

I'd suggest checking the input to sed first

miken32
  • 42,008
  • 16
  • 111
  • 154
1

This might work for you (GNU sed):

sed -E '/.*priceRuleDesc=(.*)>$/s//\1/p;d' file

This can replace the grep command too.

potong
  • 55,640
  • 6
  • 51
  • 83
1

So I finally figured out what was going on. I thought that I would post this answer in case others encounter the same issue. Essentially, the issue had nothing to do with the .* term in the regex of the sed command at all. Everything had to do with grep. The issue was that grep was highlighting the matched priceRuleDesc= term and not taking that into account. My grep command was highlighting terms because embedded within the large ~/.bash_profile I had placed the command (copy and pasted in bulk from a bunch of stuff in somebody else's file at work):

export GREP_OPTIONS='--color=auto'

The effect of this option is that when grep now matches the text, it actually transforms it by inserting characters that you can't see in that standard output. While more aesthetically pleasing, this unfortunately had the effect of making the output useless to other commands using regexes that have grep's output piped to them (i.e. the sed command). However, you can see these chracters via the xxd command.

0015960: 3f29 3a1b 5b30 313b 3331 6d1b 5b4b 7072  ?):.[01;31m.[Kpr
0015970: 6963 6552 756c 6544 6573 631b 5b6d 1b5b  iceRuleDesc.[m.[
0015980: 4b3d 3e0a                                K=>.

You can see the issue here where there are six characters between the last c character and the last = character, which are responsible for creating the highlighting effect. By commenting out the GREP_OPTIONS line in my ~/.bash_profile and restarting the terminal, the grep command finally didn't add the extraneous characters that wouldn't match the posted regex.

entpnerd
  • 10,049
  • 8
  • 47
  • 68