1

Hello I am trying save a value from an input tag in some HTML source code. The tag looks like so:

<input name="user_status" value="3" />

I have the page source in a variable (pageSourceCode), and need to work out some regex to get the value (3 in this example). I have this so far:

Dim sCapture As String = System.Text.RegularExpressions.Regex.Match(pageSourceCode, "\<input\sname\=\""user_status\""\svalue\=\""(.*)?\""\>").Groups(1).Value

Which works fine most of the time, however this code is used to process source code from multiple sites (that use the same platform), and sometimes there are other attributes included in the input tag, or they are in a different order, eg:

<input class="someclass" type="hidden" value="3" name="user_status" />

I just dont understand regex enough to cope with these situations.

Any help very much appreciated.

PS Although i am looking for a specific answer to this question if at all possible, a pointer to a good regex tutorial would be great as well

Thanks

Steve
  • 20,703
  • 5
  • 41
  • 67
  • You may want to look at this folklore question :) http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Pero P. Feb 16 '11 at 16:20
  • Yeah, im looking into using htmlAgility Pack, but this 'seemed' like overkill for this small project. – Steve Feb 16 '11 at 17:12

2 Answers2

1

You can search for <input[^>]*\bvalue="([^"]+)" if your input tags never contain angle brackets.

[^>]* matches any number of characters except > which keeps the regex from accidentally matching across tags.

\b ensures that we only match value and not something like x_value.

EDIT:

If you only want to look at input tags where name="user_status", then you can do this with an additional lookahead assertion:

<input(?=[^>]*name="user_status")[^>]*\bvalue="([^"]+)"

In VB.NET:

ResultString = Regex.Match(SubjectString, "<input(?=[^>]*user_status=""name"")[^>]*\bvalue=""([^""]+)").Groups(1).Value

A good tutorial can be found at http://www.regular-expressions.info

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Thanks, looking at the tutorial now. Your example helps, but unfortunately grabs the 1st tag (of many) and not the name="user_status" one. I am wondering if it would be better to grab the whole tag (if it contains "user_status") then run a second regex to get the value? – Steve Feb 16 '11 at 17:15
  • Are you just looking for `input` tags that contain `user_status="name"`? No problem. Will edit. – Tim Pietzcker Feb 16 '11 at 18:38
  • A bit back to front(name="user_status", not user_status="name") but i got it based on your example, thank you very much. – Steve Feb 16 '11 at 18:47
0

Assuming this is an ASP.Net page and not some external HTML you can't control the better solution would be simply to access the control.

Add an ID field to your input control and a runat="server" like this.

<input id="user_status" runat="server" class="someclass" type="hidden" value="3" name="user_status" />

You can probably get rid of the Name field. It's typically the same as the ID field and ID is a better choice. You can actually have both an ID and Name field if you want and they can both be the same value.

In your code behind you can then access the value by the ID with no need for a regex.

Me.user_status.value
Daniel Knoodle
  • 384
  • 1
  • 4