Avoid using regex to parse html; there are way too many pitfalls. Suppose you search for the string "<title" in your document. What if you don't find "<TITLE" . Ok, easy to do case-insensitive matches. But... what if there is a "<title" string embedded in a comment? What if there is such a string embedded in a script block? etc etc.
Any "search" of an HTML document needs to do more than simply text search. It needs to be document-aware. And that's what the HtmlAgilityPack provides. It's a free download.
Start with something like this:
using HtmlAgilityPack;
....
HtmlDocument doc = new HtmlDocument();
doc.Load(fileName);
var titles = doc.DocumentNode.SelectNodes("/html/head/title");
if (titles != null)
{
foreach(var title in titles)
{
Console.WriteLine("<title> on line: " + title.Line);
}
var scripts = doc.DocumentNode.SelectNodes("/html/head/script");
if (scripts != null)
{
foreach(var script in scripts)
{
Console.WriteLine("<script> on line: " + script.Line);
// here, you need to decide if the script is before the title
// and if it is the "right" script - google analytics.
// you have to do that part yourself.
}
}
else
{
Console.WriteLine("No script nodes found.");
}
}
else
{
Console.WriteLine("No title node found.");
}