1

I need to match the string that is shown in the window displayed below :

8% of setup_av_free.exe from software-files-l.cnet.com Completed

98% of test.zip from 65.55.72.119 Completed

[numeric]%of[filename]from[hostname | IP address]Completed

I have written the regex pattern halfway

if (Regex.IsMatch(text, @"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s]"))
    MessageBox.Show(text);

and I now need to integrate the following regex into my code above

ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";  

ValidHostnameRegex = "^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$"; 

The 2 regex were taken from this link. These 2 regex works well when i use the Regex.ismatch to match "123.123.123.123" and "software-files-l.cnet.com" . However i cannot get it to work when i intergrate both of them to my existin regex code. I tried several variant but not able to get it to work. Can someone guide me to integrate the 2 regex to my existing code. Thanks in advance.

Community
  • 1
  • 1
abduls85
  • 548
  • 8
  • 15

4 Answers4

2

You can certainly combine all these regular expressions into one, but I'd recommend against it. Consider this method, first it checks wether your input text has the correct form overall, then it checks if the "from" part is an IP address or a hostname.

bool CheckString(string text) {
    const string ValidIpAddressRegex = @"^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";  

    const string ValidHostnameRegex = @"^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$"; 

    var match = Regex.Match(text, @"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s](\S+)");
    if(!match.Success)
        return false;        

    string address = match.Groups[3].Value;

    return Regex.IsMatch(address, ValidIpAddressRegex) ||
           Regex.IsMatch(address, ValidHostnameRegex); 
}

It does what you want and is much more readable and than single monster-sized regular expression. If you aren't going to call this method millions of time in a loop there is no reason to be concerned about it being less performant that single regex.

Also, in case you aren't aware of that the brackets around \d or \s aren't necessary.

Dyppl
  • 12,161
  • 9
  • 47
  • 68
  • I already did something like this sometime back and it works. But i need to get all these regular expressions into one for efficiency purpose and not occupy memory but declaring additional strings. The input text will always be valid as it is from title bar of IE active downloads currently on the computer. – abduls85 Jun 09 '11 at 14:53
  • @abduls85: why do you check it then? – Dyppl Jun 09 '11 at 14:57
  • text will contain all the open windows in the desktop. I just need to filter the current download jobs. Thanks for the input. – abduls85 Jun 09 '11 at 15:04
  • @abduls85: anyway, I find it hard to believe that performance requirements to this task are so tight that you need to use one and only one regular expression (and yet not so tight to rule the use of regex altogether out) – Dyppl Jun 09 '11 at 15:07
  • This is a minor functionality to the whole multithreaded application that i am currently developing. Hence need to be a bit stringent. – abduls85 Jun 09 '11 at 15:17
2

The "Problem" that those two regexes do not match your string is that they start with ^ and end with $

^ means match the start of the string (or row if the m modifier is activated)
$ means match the end of the string (or row if the m modifier is activated)

When you try it this is true but in your real text they are in the middle of the string, so it is not matched.

Try just remove the ^ at the very beginning and the $ at the very end.

stema
  • 90,351
  • 20
  • 107
  • 135
  • I now know my mistake i mistook ^ for negation which i read from somewhere a while back. Thanks for pointing that out. – abduls85 Jun 09 '11 at 15:10
  • @abduls85 the `^` has more than one meaning, at the beginning of a character group `[]` it is a negation, e.g. `[^\d]` means everything but a digit. – stema Jun 09 '11 at 18:59
0

Here you go.

^[\d]+%[\s+]of[\s+](.+?)(\.[^.]*)[\s+]from[\s+]((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])|((([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])))[\s+]Completed

Remove the ^ and $ characters from the ValidIpAddressRegex and ValidHostnameRegex samples above, and add them separated by the or character (|) enclosed by parentheses.

matk
  • 1,528
  • 2
  • 14
  • 25
0

You could use this, its should work for all cases. I mightve accidentally deleted a character while formatting so let me know if it doesnt work.

string captureString = "8% of setup_av_free.exe from software-files-l.cnet.com Completed";
Regex reg = new Regex(@"(?<perc>\d+)% of (?<file>\w+\.\w+) from (?<host>" +
    @"(\d+\.\d+.\d+.\d+)|(((https?|ftp|gopher|telnet|file|notes|ms-help):" +
    @"((//)|(\\\\))+)?[\w\d:#@%/;$()~_?\+-=\\\.&]*)) Completed");
Match m = reg.Match(captureString);
string perc = m.Groups["perc"].Value;
string file = m.Groups["file"].Value;
string host = m.Groups["host"].Value;
FlyingStreudel
  • 4,434
  • 4
  • 33
  • 55