python regex token capture

Question

I need to capture small code (token) from html with regex, I'm writing code with BeautifulSoup but it is not possible to compile with py2exe, so for this I need a solution wihth regex. My html code is this:

<form method="post" enctype="multipart/form-data" class="wp-upload-form" action="http://localhost/wp-admin/update.php?action=upload-plugin">
        <input type="hidden" id="_wpnonce" name="_wpnonce" value="a7a9167537"><input type="hidden" name="_wp_http_referer" value="/wp-admin/plugin-install.php?tab=upload">     <label class="screen-reader-text" for="pluginzip">Plugin zip file</label>
        <input type="file" id="pluginzip" name="pluginzip">
        <input type="submit" name="install-plugin-submit" id="install-plugin-submit" class="button" value="Install Now" disabled="">    </form>

and I need to capture this code: a7a9167537

I wrote this regex but it did not work:

id="_wpnonce" name="_wpnonce" value="(.*)"

Could you put your code or at least the line with youre regex ? — Luc DUZAN, Mar 25 '14 at 11:01
soup = BeautifulSoup(html) wp_token = soup.find("input", {'id': "_wpnonce"}).attrs['value'] whit regex i not know, need help :( — kingcope, Mar 25 '14 at 11:08
There are other methods of parsing this that are *much* better than [using regex](http://stackoverflow.com/a/1732454/3001761). The standard library includes e.g. [`HTMLParser`](http://docs.python.org/2/library/htmlparser.html). — jonrsharpe, Mar 25 '14 at 11:15

Jerry · Answer 1 · 2014-03-25T15:21:44.553

1

Well, maybe something like this?

print(re.search(r'(?=<input(?=[^>]+id="_wpnonce")(?=[^>]*name="_wpnonce")[^>]+value="([^"]+)")', html).group(1))

In BeautifulSoup, you can use:

print(soup.find("input", {"id": "_wpnonce", "name": "_wpnonce"})['value'])

edited Mar 25 '14 at 15:21

answered Mar 25 '14 at 12:50

Jerry

70,495
13
100
144

i like your regex, bat need add _wpnonce in regex because the complete page have more inputs ! – kingcope Mar 25 '14 at 13:24
@kingcope Oh, okay, added those to the conditions. – Jerry Mar 25 '14 at 15:21

XciD · Answer 2 · 2014-03-25T11:47:36.173

Try :

    import re
    searchText = '<form method="post" enctype="multipart/form-data" class="wp-upload-form" action="http://localhost/wp-admin/update.php?action=upload-plugin"><input type="hidden" id="_wpnonce" name="_wpnonce" value="a7a9167537"><input type="hidden" name="_wp_http_referer" value="/wp-admin/plugin-install.php?tab=upload">     <label class="screen-reader-text" for="pluginzip">Plugin zip file</label><input type="file" id="pluginzip" name="pluginzip"><input type="submit" name="install-plugin-submit" id="install-plugin-submit" class="button" value="Install Now" disabled="">    </form>'


    print re.sub("(.+name=\"_wpnonce\"\\ value=\"([\\d\\w]{10})\">.+)", "\\2", searchText)

score 0 · Answer 3 · answered Mar 25 '14 at 11:23

0

Your regex is greedy . Change it to lazy

Modified Regex : (?<=id="_wpnonce" name="_wpnonce" value=")(.*?)"

Regex Demo

To get exactly the value, Use this regex : (?<=id="_wpnonce" name="_wpnonce" value=").*?(?=")

Regex Demo

answered Mar 25 '14 at 11:23

Krishna M

1,135
2
16
32

print re.sub("(?<=id="_wpnonce" name="_wpnonce" value=")(.*?)"", text) ^ SyntaxError: invalid syntax – kingcope Mar 25 '14 at 11:27
Working correctly for me `re.sub(r'(?<=id="_wpnonce" name="_wpnonce" value=").*?(?=")','',text)` – Krishna M Mar 25 '14 at 11:32
@kingcope Have u got it ? – Krishna M Mar 25 '14 at 11:37
pastebin.com/ptPj7JEK not working, need print only this code : a7a9167537 – kingcope Mar 25 '14 at 12:06

score 0 · Accepted Answer · answered Mar 25 '14 at 14:56

You can get it with the HTMLParser module instead of regex:

from HTMLParser import HTMLParser

s = r"""<form method="post" enctype="multipart/form-data" class="wp-upload-form" action="http://localhost/wp-admin/update.php?action=upload-plugin">
        <input type="hidden" id="_wpnonce" name="_wpnonce" value="a7a9167537"><input type="hidden" name="_wp_http_referer" value="/wp-admin/plugin-install.php?tab=upload">     <label class="screen-reader-text" for="pluginzip">Plugin zip file</label>
        <input type="file" id="pluginzip" name="pluginzip">
        <input type="submit" name="install-plugin-submit" id="install-plugin-submit" class="button" value="Install Now" disabled="">    </form>"""


class CodeFinder(HTMLParser):
    def handle_starttag(self, tag, attrs):
        if tag == "input":
            attrs = dict(attrs)
            if 'id' in attrs and attrs['id'] == '_wpnonce':
                print("The  value is: {}".format(attrs['value']))

parser = CodeFinder()
parser.feed(s)

python regex token capture

4 Answers4