0

I am trying to figure out what the regular expression would be for me to find the following within a massive string, and extract the value that's inside the value field - the value will always be a mixture of both numbers and letters. The length of the value will vary and I want to ignore case.

<input type="text" name="NAME_ID" value="id2654580" maxlength="25">

So in the above example, I would get 'id2654580' as the value, if that control/text was located within my massive string.

Trevor
  • 7,777
  • 6
  • 31
  • 50
marcusstarnes
  • 6,393
  • 14
  • 65
  • 112
  • 2
    The input string looks like it is HTML. You should use HTML parser for such parsing as regex for something like this would be very very error prone. – LB2 Feb 21 '14 at 15:07
  • 1
    If your file is valid xml, then you would be better searching it as XML rather than just a string. – Will Dean Feb 21 '14 at 15:08
  • 1
    if this is HTML, there should be some HTML helper libraries that are better suited than just regex. If it's an xml file, there's XDocument or XmlDocument. Any reason why you do not want to use those? – default Feb 21 '14 at 15:08
  • 1
    The [HAP](http://www.nuget.org/packages/HtmlAgilityPack) would help you deal with cases where you have `value = "id2654580"` or `value= 'id2654580'` or others that are all valid or "tolerated" HTML but where a too specific regex might fail to match – Paolo Falabella Feb 21 '14 at 15:11
  • If you are dealing with a ton of XML, .NET already has libraries to do this. Take a look at [XDocument](http://msdn.microsoft.com/en-us/library/system.xml.linq.xdocument(v=vs.110).aspx). There is also LINQ for XML. – Josh Bowden Feb 21 '14 at 15:11

4 Answers4

3

As the comments to the OP already pointed out: you should'nt use regex to parse html!

But as you're curious to what it would look like:
Your regex would be something like

<input.*value="(.+?)".*>

This would get you the value(s) of the input tag(s), if there are any specified.

<input   #matches "<input" literally
.*       #matches zero to unlimited characters
value="  #matches 'value="' literally
(.+?)    #captures as few characters as possible
"        #matches " literally
.*       #same as above
>        #matches > literally

In C#:

//using System.Text.RegularExpressions

string str = "<input type=\"text\" name=\"NAME_ID\" value=\"id2654580\" maxlength=\"25\">";
Regex re = new Regex(@"<input.*value=""(?<val>.+?)"".*>"); //note the named group

Match match = re.Match(str);
String value = match.Groups["val"].Value;
Community
  • 1
  • 1
KeyNone
  • 8,745
  • 4
  • 34
  • 51
  • wouldn't this retreive the whole `input` node? The OP is looking for only the value string – default Feb 21 '14 at 15:13
  • @Default it will _match_ on the whole input node, but only _capture_ the value. If you would'nt match on the whole input field you would get all values of all nodes (if there are any specified) and I understood the OP as if he wants to only get values from input fields. – KeyNone Feb 21 '14 at 15:15
  • cool. I'm not too familiar with regex, thus why I am wondering. Could you show how this would be used in a C# program then? – default Feb 21 '14 at 15:16
1

if you are only looking for the value, I would use:

Regex reg = new Regex(@"value=\""(?<value>[^\""]+)\""");

string value = null;

if(reg.IsMatch)
{
  Match m = reg.Match(inputstring);
  value = m.Groups["value"].Value;
}
pquest
  • 3,151
  • 3
  • 27
  • 40
0

That should be your regex

/value="([^"]+)"/i

here demo:http://rubular.com/r/tCj4WEtBZa

Rickert
  • 1,677
  • 1
  • 16
  • 23
0
static string GetValue(string str, string name)
{
    var rx = new Regex(@"<input\s+type=""text""\s+name="""+ name +@"""\s+value=""(?<value>.+)""\s+maxlength=""25"">");
    return rx.Match(str).Groups["value"].Value;
}  

Usage:

    var str = @"<input type=""text"" name=""NAME_ID"" value=""id2654580"" maxlength=""25"">";
    var value = GetValue(str, "NAME_ID");  //id2654580
nima
  • 6,566
  • 4
  • 45
  • 57