0
def ExtractViewState(string):
    m = re.match("__viewstate[^>]+value=\"\(\?<Value>[^\"]*\)", string, re.IGNORECASE)
    return m.group(0)

I think I'm missing something, but it m keeps returning None. Blagh.

UPDATE:

<input type="hidden" name="__VIEWSTATE" value="5vzj+3s4pEHFJUQoOJbZicZdf+k2bi0uiXeIxMNTxjocu0FLzTXEI8pEcQy/V4r1vtIP6G/E0/j0C5TwvhaWdW1wJVGwGKfO26gvQk9O0zsxy5NBpx+PlfL5h7nlnAp+GmAIwdjLWxRFFbhxaOfH+yZQKfkzshBvE7xogxrTnrrlF22BiENHdWHuMqeGYb4AUfvbbJ2psQOwTTOF6meAjszLtaAxBVTgun4gVsGOKUDqasgzyYn7AsxsJ4rJ3S/64YU2sUwAsvCD1d0X3Q8bGiwriRU/pAo31xn4SfhP8dk22QbhFbVpvIwl3WGTxohL" />

should just return the text between in the value attribute:

"5vzj+3s4pEHFJUQoOJbZicZdf+k2bi0uiXeIxMNTxjocu0FLzTXEI8pEcQy/V4r1vtIP6G/E0/j0C5TwvhaWdW1wJVGwGKfO26gvQk9O0zsxy5NBpx+PlfL5h7nlnAp+GmAIwdjLWxRFFbhxaOfH+yZQKfkzshBvE7xogxrTnrrlF22BiENHdWHuMqeGYb4AUfvbbJ2psQOwTTOF6meAjszLtaAxBVTgun4gVsGOKUDqasgzyYn7AsxsJ4rJ3S/64YU2sUwAsvCD1d0X3Q8bGiwriRU/pAo31xn4SfhP8dk22QbhFbVpvIwl3WGTxohL"

itwb
  • 427
  • 2
  • 6
  • 15
  • Can you add a string on which you want to match. Without an example, it's impossible to answer. – Eric Fortin Mar 09 '11 at 02:28
  • 1
    Don't use regexes for HTML! http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – user470379 Mar 09 '11 at 02:48
  • Have you debugged that the value of "string" is what you expect? Is it the input tag as shown? – David W Mar 09 '11 at 02:50
  • this is the only html parsing i am using in the whole program. I don't think it warrants importing another whole library to do this one thing... thoughts? – itwb Mar 09 '11 at 02:56
  • There's nothing terribly awful about using regular expressions to parse *a single self-contained HTML tag* as opposed to an actual structured piece of HTML with things nested inside other things. But you really, really should read the link user470379 provided anyway. – Gareth McCaughan Mar 09 '11 at 03:49

2 Answers2

2

You have a few issues:

import re
def ExtractViewState(string):
    # re.match looks only at the **beginning** of the string
    # dont escape the `( .. )` those capture the group
    m = re.search("__viewstate[^>]+value=\"([^\"]*)", string, re.IGNORECASE)
    # group(0) is the whole match, you want the 1st capture group
    return m.group(1)
Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
0

Three problems.

  1. You need re.search, not re.match.

  2. You need (?P<...>), not just (?<...>).

  3. You have more backslashes than you need.

    re.search("__viewstate[^>]+value=\"(?P<Value>[^\"]*)", s, re.IGNORECASE)

works for me.

user470379
  • 4,879
  • 16
  • 21
Gareth McCaughan
  • 19,888
  • 1
  • 41
  • 62
  • `[^>]+` doesn't greedily match "value="...? – user470379 Mar 09 '11 at 02:51
  • @user470379: That's only a problem if the greedy group can eat other matches, ie `re.findall("a.+a", "aba aca")` has *one* match instead of two. Here the `[^>]` prevents that because a match has to end with the tag. – Jochen Ritzel Mar 09 '11 at 02:58
  • And if you use a raw string, you don't need _any_ backslashes: `r'__viewstate[^>]+value="(?P[^"]*)"'`. We almost always want raw strings for Python regexes. Also, you're capture is `m.group('Value')` – oylenshpeegul Mar 09 '11 at 02:59
  • I was thinking `pattern = 'VIEWSTATE"[ ]+value="([^"]+)'` but I didn't test it. – David W Mar 09 '11 at 03:03