11

I am looking for a regular expression that can ensure two phrases showing up on a webpage at the same time.

The two phrases I need to ensure on the web are Current QPS (last 10s, ignored 0) and Average Latency (last 100 queries)

The webpage looks like (The query time would be different, but text won't change):

Query Statistics

Average QPS 25.3673   
Average Latency 0.1002   
Average Latency (last 100 queries) 0.0834   # Match this one, ignore output-0,0834
Average Search Latency 0.0555   
Average Docsum Latency 0.0330   
Sampling period 3133524.9570   
Current QPS (last 10s, ignored 0) 24.8000  # Also match this one, ignore output 24.8000 
Peak QPS 170.9000   
Number of requests 79717858   
Number of queries 79489080 

I am able to match each phrase on the website, but not the two phrases together. How can I make my tool ignore the content between the two phrases?

P.S. I am not programming in any language here, the regex will be put into a tool that accepts regex.

Madean
  • 129
  • 1
  • 2
  • 9
  • Basically, this is a duplicate of [this question](http://stackoverflow.com/questions/5809272/c-sharp-regular-expression-to-match-any-character). – O. R. Mapper Jun 12 '12 at 14:27
  • It involves some of the same issues.. but it's not really a duplicate. That one was asking how to deal with newlines, this one is asking how to combines regexes. – vergenzt Jun 12 '12 at 14:30

4 Answers4

10

If you can be sure that they will appear in that order, if at all, then this should work:

(<query 1>).*(<query 2>)

E.g.

(Average Latency \(last \d+ queries\)).*(Current QPS \(last \d+s, ignored \d+\))

You may need to check that the . operator matches newlines in your tool.

vergenzt
  • 9,669
  • 4
  • 40
  • 47
  • But I don't need the text between the two phrases. How can you get rid of them? – Madean Jun 12 '12 at 14:30
  • What tool are you using, and what are you trying to do if/when those patterns match? – vergenzt Jun 12 '12 at 14:38
  • I am using an enterprise tool. Basically the tool accepts the regex and return the page status as good if the two strings are found. – Madean Jun 12 '12 at 14:43
  • When you say you don't need the text between the two phrases, and you want to "ignore" it, do you mean that you want the text between the two patterns to not matter in whether or not the patterns match? If all you're doing is checking for matches, then the in-between text will not affect the results. – vergenzt Jun 12 '12 at 14:48
  • I am using an enterprise tool. Basically the tool accepts the regex and return the page status as good if the two strings are found. I think my tool accepts `.` to match newlines, since it is able to return all the contents between the two phrases. – Madean Jun 12 '12 at 14:57
  • Yes, the text between doesn't matter. Actually, they will always stay same except the outputs. Your suggestion makes sense. I am just curious about how to parse out the two phrases from that page without redundant matches. :) – Madean Jun 12 '12 at 15:03
  • Usually when regex is used in a programming context, there's a way to extract matched groups (matching text within parentheses--what you're looking for) and discard the rest. Is that what you mean? – vergenzt Jun 12 '12 at 15:07
1

my first suggest is to simply add the two patterns in your regular expression in any order you expect them to appear

/($regex1.*?$regex2|$regex2.*?$regex1)/
Hachi
  • 3,237
  • 1
  • 21
  • 29
  • Thanks for the help, but unfortunately the expression doesn't work out in my tool. One quesiont - does the `.*?`do the work of ignoring the middle part? – Madean Jun 13 '12 at 04:49
  • ``.*?`` matches any (the smallest) part between the two expressions ; maybe you have to set a flag for . to match newlines – Hachi Jun 14 '12 at 21:19
0

It might depend on the tool you're using--specifically, how it handles multiple lines.

You can try this:

Average Latency \(last \d+ queries\)\s(.*\s)*Current QPS \(last \d+s, ignored \d+\)\s
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
0

This should work

(?im)^(Average\s+Latency\s+\(last\s+100\s+queries\)|Current\s+QPS\s+\(last\s+10s,\s+ignored\s+0\)).+
Cylian
  • 10,970
  • 4
  • 42
  • 55