1

I want to extract to form content for processing.

What I already get from CURL is with mycurlcommand | grep "type=\"hidden":

<input type="hidden" name="var1" value="ABC">
<input type="hidden" name="var2" value="DEF">
<input type="hidden" name="var3" value="GHI">
<input type="hidden" name="var4" value="JKL">
<input type="hidden" name="var5" value="">

I want to get this:

var1=ABC
var2=DEF
var3=GHI
var4=JKL
var5=

to process and pass it again to CURL. I am be sure it is possible to do this in awk/cut/sed - other xml parsing tool are not available on my limited linux install (small storage).

Thomas
  • 1,193
  • 1
  • 7
  • 16
  • 1
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Sep 06 '20 at 09:21

2 Answers2

4

Since you mention xml parsing tool are not available, you can use these solutions. But, it may not work if the input pattern is different than the sample shown in the question. As a bonus, these solutions will eliminate the need for grep command mentioned in the question.

$ # use = or " characters as input field separator
$ # set = as output field separator
$ # print the required fields
$ awk -F'[="]' -v OFS='=' '/type="hidden"/{print $6, $9}' ip.txt
var1=ABC
var2=DEF
var3=GHI
var4=JKL
var5=

$ # this is useful when number of fields isn't fixed
$ # but the order has to be name followed by value
$ sed -nE '/type="hidden"/ s/.*name="([^"]*)".*value="([^"]*)".*/\1=\2/p' ip.txt
var1=ABC
var2=DEF
var3=GHI
var4=JKL
var5=
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • I like your sed command, but it does not work -- it just displays the whole html file. Also I updated my question, because the given commands don't honor a empty value. – Thomas Sep 06 '20 at 09:42
  • @Thomas I missed adding `-n` and `p` for the `sed` command... I've updated both the commands for the new input.. but as I mentioned, solution depends on the type of sample you show – Sundeep Sep 06 '20 at 09:51
  • ok sorry - the sed and awk command works now and honors empty values – Thomas Sep 06 '20 at 09:52
3

Could you please try following(since OP mentioned no other tools present for OP and guidance needed in awk or shell so going with this solution). I am passing Input_file to awk command if you are passing your_command output to awk then change following to like your_command | awk.....

awk '
match($0,/name="[^"]*/){
  val1=substr($0,RSTART,RLENGTH)
  match($0,/value="[^"]*/)
  val2=substr($0,RSTART,RLENGTH)
  sub(/.*"/,"",val1)
  sub(/.*"/,"",val2)
  print val1"="val2
  val1=val2=""
}'  Input_file

Explanation: Adding detailed explanation for above.

awk '                               ##Starting awk program from here.
match($0,/name="[^"]*/){            ##Using match to match from name=" till next " comes in current line.
  val1=substr($0,RSTART,RLENGTH)    ##Saving sub string of current line into val1 here.
  match($0,/value="[^"]*/)          ##Using match to match a regex from value=" till next occurance of " in current line.
  val2=substr($0,RSTART,RLENGTH)    ##Saving sub string into val2 which has previous match RSTART RLENGTH values.
  sub(/.*"/,"",val1)                ##Substituting everything till " in val1 here.
  sub(/.*"/,"",val2)                ##Substituting everything till " in val2 here.
  print val1"="val2                 ##Printing val1 = and val2 here.
  val1=val2=""                      ##Nullify val1 and val2 here.
}' Input_file                        ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • this works and also honors a empty value (like shown in my updated question) - It looks only somehow long – Thomas Sep 06 '20 at 09:45
  • @Thomas, thanks for confirming, why I used `match` is to make sure that values are really matching in current line. For your long comment--> do you want a one liner form of it(kindly confirm on same)? – RavinderSingh13 Sep 06 '20 at 09:48
  • @Thomas, for one liner try following please `your_command | awk 'match($0,/name="[^"]*/){val1=substr($0,RSTART,RLENGTH);match($0,/value="[^"]*/);val2=substr($0,RSTART,RLENGTH);sub(/.*"/,"",val1);sub(/.*"/,"",val2);print val1"="val2;val1=val2=""}' ` – RavinderSingh13 Sep 06 '20 at 09:54