-1

I need to capture small code (token) from html with regex, I'm writing code with BeautifulSoup but it is not possible to compile with py2exe, so for this I need a solution wihth regex. My html code is this:

<form method="post" enctype="multipart/form-data" class="wp-upload-form" action="http://localhost/wp-admin/update.php?action=upload-plugin">
        <input type="hidden" id="_wpnonce" name="_wpnonce" value="a7a9167537"><input type="hidden" name="_wp_http_referer" value="/wp-admin/plugin-install.php?tab=upload">     <label class="screen-reader-text" for="pluginzip">Plugin zip file</label>
        <input type="file" id="pluginzip" name="pluginzip">
        <input type="submit" name="install-plugin-submit" id="install-plugin-submit" class="button" value="Install Now" disabled="">    </form>

and I need to capture this code: a7a9167537

I wrote this regex but it did not work:

id="_wpnonce" name="_wpnonce" value="(.*)"
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
kingcope
  • 1,121
  • 4
  • 19
  • 36
  • Could you put your code or at least the line with youre regex ? – Luc DUZAN Mar 25 '14 at 11:01
  • soup = BeautifulSoup(html) wp_token = soup.find("input", {'id': "_wpnonce"}).attrs['value'] whit regex i not know, need help :( – kingcope Mar 25 '14 at 11:08
  • 1
    There are other methods of parsing this that are *much* better than [using regex](http://stackoverflow.com/a/1732454/3001761). The standard library includes e.g. [`HTMLParser`](http://docs.python.org/2/library/htmlparser.html). – jonrsharpe Mar 25 '14 at 11:15

4 Answers4

1

Well, maybe something like this?

print(re.search(r'(?=<input(?=[^>]+id="_wpnonce")(?=[^>]*name="_wpnonce")[^>]+value="([^"]+)")', html).group(1))

In BeautifulSoup, you can use:

print(soup.find("input", {"id": "_wpnonce", "name": "_wpnonce"})['value'])
Jerry
  • 70,495
  • 13
  • 100
  • 144
0

Try :

    import re
    searchText = '<form method="post" enctype="multipart/form-data" class="wp-upload-form" action="http://localhost/wp-admin/update.php?action=upload-plugin"><input type="hidden" id="_wpnonce" name="_wpnonce" value="a7a9167537"><input type="hidden" name="_wp_http_referer" value="/wp-admin/plugin-install.php?tab=upload">     <label class="screen-reader-text" for="pluginzip">Plugin zip file</label><input type="file" id="pluginzip" name="pluginzip"><input type="submit" name="install-plugin-submit" id="install-plugin-submit" class="button" value="Install Now" disabled="">    </form>'


    print re.sub("(.+name=\"_wpnonce\"\\ value=\"([\\d\\w]{10})\">.+)", "\\2", searchText)
XciD
  • 2,537
  • 14
  • 40
0

Your regex is greedy . Change it to lazy

Modified Regex : (?<=id="_wpnonce" name="_wpnonce" value=")(.*?)"

Regex Demo

To get exactly the value, Use this regex : (?<=id="_wpnonce" name="_wpnonce" value=").*?(?=")

Regex Demo

Krishna M
  • 1,135
  • 2
  • 16
  • 32
0

You can get it with the HTMLParser module instead of regex:

from HTMLParser import HTMLParser

s = r"""<form method="post" enctype="multipart/form-data" class="wp-upload-form" action="http://localhost/wp-admin/update.php?action=upload-plugin">
        <input type="hidden" id="_wpnonce" name="_wpnonce" value="a7a9167537"><input type="hidden" name="_wp_http_referer" value="/wp-admin/plugin-install.php?tab=upload">     <label class="screen-reader-text" for="pluginzip">Plugin zip file</label>
        <input type="file" id="pluginzip" name="pluginzip">
        <input type="submit" name="install-plugin-submit" id="install-plugin-submit" class="button" value="Install Now" disabled="">    </form>"""


class CodeFinder(HTMLParser):
    def handle_starttag(self, tag, attrs):
        if tag == "input":
            attrs = dict(attrs)
            if 'id' in attrs and attrs['id'] == '_wpnonce':
                print("The  value is: {}".format(attrs['value']))

parser = CodeFinder()
parser.feed(s)
khagler
  • 3,996
  • 29
  • 40