0

I want change that piece of code: $(perl -p -e 's/\$(\w+)/("$ENV{$1}")/eg' < file_with_html)

And do parsing and changing without perl file_with_html: contains html template, for example

      <th class='xtr-0-0'>Version name</th>
      <td class='xtr-0-1'>$RELEASE_TAG</td>
    </tr>
    <tr class='xtr-1'>
      <th class='xtr-1-0'>Link</th>
      <td class='xtr-1-1'>$RELEASE_URL</td>...

I need change all $(\w+) on ENVs with same name and send that "parsed" template to POST request. How i can do that with grep/sed/awk/etc ?

UPD1: I send POST with curl

Fullscript, that generates new page in Confluence:

newPageTemplate=$(perl -p -e 's/\$(\w+)/("$ENV{$1}")/eg' < $CONFLUENCE_PAGE_TEMPLATE)

newPageContent="{
    \"type\": \"page\",
    \"title\": \"$CONFLUENCE_PAGE_TITLE\",
    \"ancestors\": [
        {
            \"id\": \"$CONFLUENCE_PARENTPAGE_ID\"
        }
    ],
    \"space\": {
        \"key\": \"$CONFLUENCE_SPACE\"
    },
    \"body\": {
        \"storage\": {
            \"value\": \"$(echo ${newPageTemplate})\",
            \"representation\": \"storage\"
        }
    }
}"
curl --request POST \
--url $CONFLUENCE_API_URL \
--header "authorization: Basic $JIRA_TOKEN" \
--header "content-type: application/json" \
--data "$newPageContent"
Zash
  • 3
  • 3
  • 4
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Jun 26 '19 at 16:15
  • 1
    Then why did you tag it with Perl? – Grinnz Jun 26 '19 at 16:15
  • Please add sample input (no descriptions, no images) and your desired output for that sample input to your question (no comment). – Cyrus Jun 26 '19 at 16:17
  • Hi, welcome to stack overflow. What have you tried? – Jeroen Heier Jun 26 '19 at 16:25
  • [edit] your question to show the expected output given that output. YMMV if you're relying on us being able to read that perl script to figure out **exactly** what it is you need to output. – Ed Morton Jun 26 '19 at 17:10
  • What POST request? Are you running shell scripts in your web server? Why?! – melpomene Jun 26 '19 at 18:30

2 Answers2

1

The original perl is much simpler but it can probably be done in awk.

Perl's \w matches slightly more than [0-9a-zA-Z_] (see: https://metacpan.org/pod/perlrecharclass#Word-characters) but I'll assume that's all that will appear in an environment variable name (which also cannot begin with a digit).

POSIX AWK

awk '
    {
        n = split( $0, f, /[^$0-9a-zA-Z_]+/ )
        for ( i=1; i<=n; i++ ) {
            v = f[i]
            if ( v ~ /^[$][a-zA-Z_]/ ) {
                sub( /^[$]/, "", v )
                sub( "[$]"v, ENVIRON[v] )
            }
        }
    }
    1
' file_with_html
  • split extracts potential environment variable references
  • v ~ /.../ matches just the valid ones
  • first sub removes the leading $
  • second sub replaces the $ with appropriate escaping and replaces the variable reference with the value (if any)
  • 1 prints every line

Warning: The code above has a subtle bug. In the pathological case where a line needs multiple substitutions and the value of one looks like a reference to a subsequent one, because the substitutions do not happen simultaneously, the wrong part of the line will be changed. Eg. If A='$B'; B='x'; and line contains: .. $A .. $B .., output will be .. x .. $B .. and not .. $B .. x ...

GNU GAWK

The bug can be fixed if gawk extensions are used. For example, with patsplit:

gawk '
   {
      if (n = patsplit($0, f, "[$][a-zA-Z_][0-9a-zA-Z_]*", s)) {
         printf "%s", s[0]
         for ( i=1; i<=n; i++ ) {
            sub( /^[$]/, "", f[i] )
            printf "%s%s", ENVIRON[ f[i] ], s[i]
         }
         printf "\n"
      }
      else print
   }
' file_with_html
jhnc
  • 11,310
  • 1
  • 9
  • 26
0

Welcome to Stack Overflow.

I need change all $(\w+) on ENVs with same name. How i can do that with grep/sed/awk/etc?

First, grep, sed, and awk are not "pure shell". Those are commands installed on the system -- just like perl is. If you can install perl, I recommend it; it's terribly useful. If not, your next best approach might be copying the source files to a machine where you can install perl, and running the process as-is, since the end result is to POST the results back to Confluence (i.e., you can POST from off-system).

But, if you really, absolutely cannot use perl, of course there is a way to do this with other text-processing tools like grep, sed, and awk.

Second, what have you tried so far? Stack Overflow works best when you show us what you've done so far, what works, and in what ways it doesn't work. That helps people focus their responses according to your exact problem and tailored to your current understanding.

Third, here is a script that might guide you. Again, I don't know how much bash your know, so this might be a bit obscure. It's also certainly not the only way to do this, and quite probably not the best. I highly recommend testing this by running it on a few sample pages and inspecting the output before POST'ing it; tools like diff or vimdiff will help a lot here. Then, even when you are ready to actually POST the results, start slowly with a subset, and validate the results before opening up the firehose.


WARNING: the below script breaks under common circumstances

As noted by @jhnc in the comments, the sed command will fail when the replacement text contains characters that are replacement-metacharacters for sed (such as '/' in a URL). There is a way to compensate with further script logic, but IMHO down that path lies madness.

My recommendation if perl cannot be installed on the target machine is my "next best approach" that I mentioned above: copy the input data to a machine where you can run perl, and run the transformation and POST back to Confluence from there.

But also take a look at the answer from @jhnc, which offers a solution that avoids this weakness in mine.

(I'm placing this warning here instead of deleting my answer, because of the simpler solutions above, and in case this approach serves as a basis for someone who wishes to improve upon it.)

USE THE BELOW APPROACH WITH A GREAT DEAL OF CAUTION


replace-env-params.sh

#!/bin/bash

while IFS= read -r LINE; do

    MATCH=$(echo "$LINE" | grep -E '\$[a-zA-Z0-9_]+')
    if [[ ! -z "$MATCH" ]]; then
        ENVPARAM=$(echo "$LINE" | sed 's/^.*\$\([a-zA-Z0-9_]*\).*$/\1/')

        ENVVAL="$ENVPARAM"
        REPLACE="${!ENVVAL}"

        LINE=$(echo "$LINE" | sed "s/\$[a-zA-Z0-9_]*/$REPLACE/")
    fi

    echo "$LINE"

done < $1

cat somehtml

      <th class='xtr-0-0'>Version name</th>
      <td class='xtr-0-1'>$RELEASE_TAG</td>
    </tr>
    <tr class='xtr-1'>
      <th class='xtr-1-0'>Link</th>
      <td class='xtr-1-1'>$RELEASE_URL</td>

testing ...

export RELEASE_TAG=11111111
export RELEASE_URL=22222222

./replace-env-params.sh somehtml

      <th class='xtr-0-0'>Version name</th>
      <td class='xtr-0-1'>11111111</td>
    </tr>
    <tr class='xtr-1'>
      <th class='xtr-1-0'>Link</th>
      <td class='xtr-1-1'>22222222</td>

Thus, you can replace:

newPageTemplate=$(perl -p -e 's/\$(\w+)/("$ENV{$1}")/eg' < $CONFLUENCE_PAGE_TEMPLATE)

with

newPageTemplate=$(./replace-env-params.sh $CONFLUENCE_PAGE_TEMPLATE)
landru27
  • 1,654
  • 12
  • 20
  • 1
    1) sed will barf if REPLACE contains special characters; 2) your code doesn't handle multiple variables on a single line – jhnc Jun 28 '19 at 02:14
  • @jhnc : both of those are fair points, and to that extent yes, my solution does not fully emulate the perl being replaced; for the use-case at hand, that might or might not matter; e.g., have you ever seen an environment variable name with anything other than `[a-zA-Z0-9_]`? while other characters are legal, I've never seen it done; and from contextual clues, I'd give good odds that in the actual data, a given line has only one environment variable; I'm not disagreeing with you, but speaking pragmatically, this solution might be "good enough" – landru27 Jun 28 '19 at 14:10
  • @Zash : regarding the comment here from jhnc, if your actual data does in fact have some lines with more than one `$THING_TO_REPLACE`, you can (1) use my script as-is and run a multipass for those input templates, (2) adapt my script as necessary, or (3) let me know and I'll take a crack at adapting it; thx! – landru27 Jun 28 '19 at 14:13
  • 1
    I'm talking about the value of the variable, not the name. For example, a url is likely to contain `/` so the sed command becomes `s/.../.../.../ ` – jhnc Jun 28 '19 at 14:27
  • @jhnc : oh, I see what you are saying; that's an excellent point; thank you for catching that and taking the time to point it out; I will update my answer with a warning – landru27 Jun 28 '19 at 14:53
  • @jhnc : now that I see that you have also provided an answer (I didn't scroll far enough before), I will also edit my answer to refer to yours – landru27 Jun 28 '19 at 15:18