-1

suppose i have html like

<html>
<Head>
<link type="text/css" href="c1.css" rel="stylesheet" />
<link type="text/css" href="c2.css" rel="stylesheet" />
<link type="text/css" href="c3.css" rel="stylesheet" />
<link type="text/css" href="c4.css" rel="stylesheet" />
<link type="text/css" href="c5.css" rel="stylesheet" />

<script type="text/javascript" src="j1.js"></script>
<script type="text/javascript" src="j2.js"></script>
</Head>

<body>

<script type="text/javascript" src="j3.js"></script>
<script type="text/javascript" src="j4.js"></script>

</body>
</html>

first i will use a regex which will return me all link tag detail and second regex will return me all script tag detail. i search google but not getting anything suitable. if anyone aware of the two regex pattern then please let me know. thanks

Keith Costa
  • 1,783
  • 11
  • 35
  • 68

4 Answers4

2

This answer is the one you're looking for. Do not try to parse HTML with regexes.

Community
  • 1
  • 1
Moo-Juice
  • 38,257
  • 10
  • 78
  • 128
2

As it's been commented by others, it might not be a good practice trying to parse HTML with regexes, but this is what you'd asked for. So here we go:

Regular Expression for `link` tag

@"(?ix)" +
@"<link\s*type=\x22(?'type'.*?)\x22\s*" +
@"href=\x22(?'href'.*?)\x22\s*" +
@"rel=\x22(?'rel'.*?)\x22\s*" +
@"\/>";

Regular Expression for `script` tag

@"(?ix)" + 
@"<script\s*type=\x22(?'type'.*?)\x22\s*" +
@"src=\x22(?'src'.*?)\x22\s*" +
@"><\/script>";

Example

Supposing that you have your HTML in a variable of type string:

public const string LINK_PATTERN = 
                        @"(?ix)" +
                        @"<link\s*type=\x22(?<type>.*?)\x22\s*" +
                        @"href=\x22(?<href>.*?)\x22\s*" +
                        @"rel=\x22(?<rel>.*?)\x22\s*" +
                        @"\/>";

public const string SCRIPT_PATTERN =
                        @"(?ix)" +
                        @"<script\s*type=\x22(?<type>.*?)\x22\s*" +
                        @"src=\x22(?<src>.*?)\x22\s*" +
                        @"><\/script>";

static void Main(string[] args)
{
    string html = getBody();

    Regex links = new Regex(LINK_PATTERN);
    Regex scripts = new Regex(SCRIPT_PATTERN);

    foreach (Match link in links.Matches(html)) 
    {
        Console.WriteLine("<link>: " + link);

        Console.WriteLine("\ttype: " + link.Groups["type"]);
        Console.WriteLine("\thref: " + link.Groups["href"]);
        Console.WriteLine("\trel: " + link.Groups["rel"]);

        Console.WriteLine("");
    }

    foreach (Match script in scripts.Matches(html)) 
    {
        Console.WriteLine("<script>: " + script);

        Console.WriteLine("\ttype: " + script.Groups["type"]);
        Console.WriteLine("\tsrc: " + script.Groups["src"]);

        Console.WriteLine("");
    }

    Console.ReadKey();
}

public static string getBody()
{
    string html = "";

    html += "<html>";
    html += "<head>";
    html += "<link type=\"text/css\" href=\"c1.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c2.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c3.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c4.css\" rel=\"stylesheet\" />";
    html += "<link type=\"text/css\" href=\"c5.css\" rel=\"stylesheet\" />";
    html += "<script type=\"text/javascript\" src=\"j1.js\"></script>";
    html += "<script type=\"text/javascript\" src=\"j2.js\"></script>";
    html += "<body>";
    html += "<script type=\"text/javascript\" src=\"j3.js\"></script>";
    html += "<script type=\"text/javascript\" src=\"j4.js\"></script>";
    html += "</body>";
    html += "</html>";

    return html;
}
Eder
  • 1,874
  • 17
  • 34
1

It is not a good idea to parse HTML with regexes, it requires a real parser to do it properly.

While it is possible to make it work with the first example text you're given, you will then seem to spend every waking moment making changes to cover every 'special case' in the next text that you have to parse.

Julian
  • 2,021
  • 16
  • 21
1

This parser seems popular: HTML Agility Pack

wp78de
  • 18,207
  • 7
  • 43
  • 71
Steve Wellens
  • 20,506
  • 2
  • 28
  • 69