Extract part of a curl return in Bash to allocate to a variable

Question

I would like to extract a string value from a curl returned webpage in a bash script but am unsure how to go about this?

The value I am interested in is always returned by curl looks like this:

    <head>
    <title>UKIPVPN.COM FREE VPN Service</title>
    <style type='text/css'>
      #button {
        width:180px;
        height:60px;
        font-family:verdana,arial,helvetica,sans-serif;
        font-size:20px;
        font-weight: bold;
      }
    </style>
  </head>
  <br>
  <br>
     <font color=blue><center>  <h1>Welcome to Free UK IP VPN Service</h1>               </center></font>

     <form method='post' action='http://www.ukipvpn.com'>
  <center><input type='hidden' name='sessionid' value='4b5q43mhhgl95nsa9v9lg8kac7'></center><br>
  <center><input id='button' type='submit' value='  I AGREE  ' /><br><br>     <h2> Your TOS Let me use the Free VPN Service</h2></center>
     </form>



       <br><center><font size='2'>No illegal activities allowed. In case of abuse, users' VPN access log is subjected to expose to related authorities.</font></center>
       </html>

The value I would like to extract to a variable in Bash is the value='this is the value i am interested in'.

Thanks for any help;

Andy

Sorry, I am new to bash scripting. Should I allocate the entire curl return to a variable and then run your grep command on that variable? Could you expand slightly please? — andy, Feb 25 '15 at 07:16
Your question is not how to parse curl, but how to parse HTML. As explained by @that other guy, using regex (grep) is generaly not adapted to parse arbitrary HTML (it's sometimes appropriate to parse a limited, known set of HTML). Follow the context, you should consider to request another url that returns a more structured type as XML or JSON. — mcoolive, Feb 25 '15 at 10:37

Avinash Raj · Accepted Answer · 2015-02-25T07:28:59.380

1

You could try the below.

$ val=$(curl somelink | grep -oP "name='sessionid'[^<>]*\bvalue\s*=\s*'\K[^']*")

edited Feb 25 '15 at 07:28

answered Feb 25 '15 at 07:21

Avinash Raj

172,303
28
230
274

it seems to pick up the 'I AGREE' value instead of the session id value. – andy Feb 25 '15 at 07:26
Nailed it! Thank you! – andy Feb 25 '15 at 07:32

score 1 · Answer 2 · edited May 23 '17 at 12:20

1

There are some arguments against using regex to parse HTML.

Here's a more robust XPath based version using tidy and xmlstarlet:

var=$(curl someurl | 
  tidy -asxml 2> /dev/null | 
  xmlstarlet sel -t -v '//_:input[@name="sessionid"]/@value' 2> /dev/null);

edited May 23 '17 at 12:20

Community

1
1

answered Feb 25 '15 at 07:48

that other guy

116,971
11
170
194

Extract part of a curl return in Bash to allocate to a variable

2 Answers2