0

I've looked at lots of similar questions and answers but hitting a brick wall.

I have an XML file with a line like this:

<blah:formProperty id="_blah" default="%HTML%">

I need to replace %HTML% with about 200+ lines like this:

&lt;style&gt;
blah
&lt;/style&gt;
&lt;script&gt;
blah
&lt;/script&gt;

Using sed throws an error because it doesn't like multiple lines.

awk seems like a better choice, but can't figure out how to get it done.

Replace a word with multiple lines using sed? is close, but I can't get the awk example to work. How is $DATA defined such that 'echo $DATA' returns multiple lines? Tons of forum topics on this and all say that only

echo "$DATA" 

will print multiple lines.

So this is really a 2 part question. How do I solve my problem above? And how did they get that awk example to work?

Community
  • 1
  • 1
Paul Ericson
  • 777
  • 2
  • 7
  • 15

1 Answers1

3

How is $DATA defined such that 'echo $DATA' returns multiple lines?

Quote your multiple lines of text. For example:

$ DATA='&lt;style&gt;
blah
&lt;/style&gt;
&lt;script&gt;
blah
&lt;/script&gt;'

Now if you echo the variable, you’ll get

$ echo "$DATA"
&lt;style&gt;
blah
&lt;/style&gt;
&lt;script&gt;
blah
&lt;/script&gt;

awk seems like a better choice, but can't figure out how to get it done.

Now that you have a variable defined, you can use that variable in awk by doing:

awk -v var="$DATA" '{sub(/%HTML%/,var)}1' file.xml 

$ cat file.xml 
h:formProperty id="_blah" default="%HTML%">

$ awk -v var="$DATA" '{sub(/%HTML%/,var)}1' file.xml 
h:formProperty id="_blah" default="%HTML%lt;style%HTML%gt;
blah
%HTML%lt;/style%HTML%gt;
%HTML%lt;script%HTML%gt;
blah
%HTML%lt;/script%HTML%gt;">

Now you must be wondering why do you get %HTML% in the replacement. This is because there is a special character & which tells sub function to generate the matched text which in our case is %HTML%. To avoid this you need to escape it. Using \\ will allow sub to put a literal &. Using \& is treated as plain & which you don’t want either.

$ DATA='\\&lt;style\\&gt;
blah
\\&lt;/style\\&gt;
\\&lt;script\\&gt;
blah
\\&lt;/script\\&gt;'

$ awk -v var="$DATA" '{sub(/%HTML%/,var)}1' file.xml 
h:formProperty id="_blah" default="&lt;style&gt;
blah
&lt;/style&gt;
&lt;script&gt;
blah
&lt;/script&gt;”>

Update:

As OP stated he is using an awk on OSX which doesn’t accept variables with embedded newlines, updating the answer as suggested by mklement0 in comments.

awk -v var="${DATA//$'\n'/\\n}" '{sub(/%HTML%/,var)}1' file.xml 
Community
  • 1
  • 1
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • You wrapped the var in dbl quotes for your echo. That's not what the example I gave does. I asked how to get echo to display a multi-line var without double quotes. – Paul Ericson Feb 13 '14 at 03:37
  • I get this error with the \\&s and the awk: awk: newline in string \\<style\\> #p... at source line 1. This is basically the same error I was getting with sed too. – Paul Ericson Feb 13 '14 at 03:43
  • 1
    @PaulEricson: Using unquoted variable references by definition replaces newlines with spaces (word splitting). Your problem is that OSX `awk` (unlike `gawk`, which @jaypalsingh assumes) doesn't accept variables with embedded newlines. You can work around that with passing `-v var="${DATA//$'\n'/\\n}"` to `awk` - you still need to escape the `&` chars as described in this answer, though. – mklement0 Feb 13 '14 at 03:46
  • @mklement0 Thanks for the workaround. I probably should have mentioned the answer used `gawk`. – jaypal singh Feb 13 '14 at 03:52
  • so the previous error was with the data only having \&. When I change the data to have \\&. I get this error: awk: newline in string \\\<style\\\> ... at source line 1 – Paul Ericson Feb 13 '14 at 03:56
  • @PaulEricson Define the `awk` variable like `-v var="${DATA//$'\n'/\\n}”` – jaypal singh Feb 13 '14 at 03:57
  • jaypalsingh: You're welcome - now that we know it's about OSX, perhaps you could update your answer with the workaround; @PaulEricson: glad to hear it. – mklement0 Feb 13 '14 at 04:02
  • Got solution working under MacOS. Trying to get it working under linux and getting awk limit error: awk: program limit exceeded: replacement pieces size=255 So I think I need a non-awk solution as even if recompiling awk could fix the problem, I can't expect any one else adopting my solution to have to build a custom awk binary. Perhaps a little more background. I'm working in Eclipse. This awk substitution is running in a custom Builder shell script. Perhaps there is a better way to sub a large text block for a single word? – Paul Ericson Feb 19 '14 at 02:40
  • The problem is with awk's "var". I changed the line to: `awk '{sub(/%HTML%/,"$html")}1' file.xml` And the error went away. – Paul Ericson Feb 19 '14 at 03:00