1

I have string like this:

"Some standard text CONST_INSIDE_QUOTES" blah blah CONST "There might be another quotes"

The thing is, that i want to replace all constants in string with some text, but it mustn't be applied on constants inside text in quotes. I have this regex:

sed "s/([A-Z][A-Z0-9_]*)([^a-z])/<span class=\"const\">\1<\/span>\2/g"

which of course works for all consts. Any ideas how to exclude its apply on quotes constants? Unfortunately sed only...

milanseitler
  • 765
  • 1
  • 7
  • 21
  • please provide your desired output and include the commas in your sample input that would otherwise cause problems – SiegeX Mar 15 '11 at 23:26
  • "Some standard text CONST_INSIDE_COMMAS" blah blah CONST "There might be another commas" – milanseitler Mar 15 '11 at 23:27
  • I don't fully understand what you mean by `constants inside text in commas` Since the sed command must deal with these commas, it would more helpful for us to provide you an answer if you actually provided a real-world example with your desired output than use pseudo code. – SiegeX Mar 15 '11 at 23:31
  • `read(3, "ogpid=30589 0 0\nFIK/XBRADA08.STU"..., 1024)` I want to edit my regex so it doesn't apply on FIK, XBRADA08 and STU – milanseitler Mar 15 '11 at 23:37
  • By "commas" I assume you mean quotes? I.e. a comma is this: , whereas a quote is this: " – srgerg Mar 15 '11 at 23:40
  • Aha, yes he does mean ***quotes*** not ***commas*** – SiegeX Mar 15 '11 at 23:41
  • Yes, I apologize, too late to have correct english, my mistake :-] – milanseitler Mar 15 '11 at 23:44
  • This is going to be very hard with `sed`. Are you sure you can't use `awk`? Most systems that have `sed` also have `awk` – SiegeX Mar 15 '11 at 23:47
  • Well I can but I use sed in whole script so I don't want to mess these two tools together. I thought that something like \(\".*\"\)... would work but it doesn't – milanseitler Mar 15 '11 at 23:49
  • If `sed` supported *look behind assertions* as `perl` does, then this would be much easier. In fact, `perl -pe s///` might be the way to go for this. Its like `sed` on steroids. – SiegeX Mar 15 '11 at 23:57
  • Note that in British English "inverted commas" refers to quotation marks. – Dennis Williamson Mar 16 '11 at 00:48
  • Siege: I know what you mean, it's powerful thing, however i can't use perl Dennis: This might be the reason of my mistake :) – milanseitler Mar 16 '11 at 06:59

2 Answers2

1

Ok, it's not pretty but it works as long as you don't have nested quotes.

That is to say:

blah "foo" blah "bar" OK
"blah "foo" blah "bar" blah" NOT OK

It uses a the double-quote as the field separator and then only works on odd-numbered fields (via the % operator) to do its substitutions. This essentially solves the balanced parentheses problem when you don't have nested quotes.

awk -F'"' '{
  for(i=1;i<NF;i++)
    if(i%2)
    $i=gensub(/([[:upper:]][[:upper:][:digit:]_]*)/,"<span class=\"const\">\\1</span>","g",$i)
}1' OFS='"'

Proof of Concept

$ echo 'read(3, "ogpid=30589 0 0\nFIK/XBRADA08.STU"..., 1024); blah blah C3434ONST "some other text"'  | awk -F'"' '{for(i=1;i<NF;i++)if(i%2)$i=gensub(/([[:upper:]][[:upper:][:digit:]_]*)/,"<span class=\"const\">\\1</span>","g",$i)}1' OFS='"'
read(3, "ogpid=30589 0 0\nFIK/XBRADA08.STU"..., 1024); blah blah <span class="const">C3434ONST</span> "some other text"
SiegeX
  • 135,741
  • 24
  • 144
  • 154
  • Awk is really only way to achieve this? – milanseitler Mar 16 '11 at 07:02
  • Certainly not the *only* way, other full featured languages such as perl, python, php, ruby etc. could do it as well; although I think awk (and perhaps perl) would do it best. – SiegeX Mar 16 '11 at 07:26
  • Sure but I can't use those languages, it's a bash script which should use only basic tools...I'll give it a try with awk. Thanks. – milanseitler Mar 16 '11 at 07:32
0

A well known problem with regular expression is matching balanced parentheses, which is equivalent to the problem you are facing of matching balanced quotes (which you've called commas in your question).

What you want is to know that there are either zero, or an even number of quotes before the constant in your regular expression. Unfortunately, regular expressions aren't designed to count characters in this way. See the answer to this question for more information.

Community
  • 1
  • 1
srgerg
  • 18,719
  • 4
  • 57
  • 39