0

I have been doing some snooping around and found what I thought was the right solution to my problem, non-greedy, but it is failing to work as expected.

I am attempting to segregate drop down menus that have the same content (for a LoadRunner script). The HTML code looks like this;

<input type="hidden" name="advanceDiscount" value="0"  /><table border="0" cellspacing="5"><tr><td align="left">Departure City :</td> <td><select name="depart" >
<option selected="selected" value="Denver">Denver</option>
<option value="Frankfurt">Frankfurt</option>
<option value="London">London</option>
<option value="Los Angeles">Los Angeles</option>
<option value="Paris">Paris</option>
<option value="Portland">Portland</option>
<option value="San Francisco">San Francisco</option>
<option value="Seattle">Seattle</option>
<option value="Sydney">Sydney</option>
<option value="Zurich">Zurich</option>
</select></td> <td align="left">Departure Date :</td> <td><input type="text" name="departDate" value="05/07/2014" size="10" maxlength="10" /> 
<!-- Departure Date Applet -->
<APPLET CODEBASE="/WebTours/classes/" CODE="FormDateUpdate.class" MAYSCRIPT Width=26 Height=28 BORDER=0>
   <PARAM NAME=CalenderTitle  VALUE="Select Departure Date">
   <PARAM NAME=HtmlFormIndex  VALUE=0>
   <PARAM NAME=HtmlEditIndex  VALUE=2>
   <PARAM NAME=AutoClose      VALUE=1>
   <PARAM NAME=Label          VALUE="...">
</APPLET>
</td></tr> <tr><td align="left">Arrival City :</td> <td><select name="arrive" >
<option selected="selected" value="Denver">Denver</option>
<option value="Frankfurt">Frankfurt</option>
<option value="London">London</option>
<option value="Los Angeles">Los Angeles</option>
<option value="Paris">Paris</option>
<option value="Portland">Portland</option>
<option value="San Francisco">San Francisco</option>
<option value="Seattle">Seattle</option>
<option value="Sydney">Sydney</option>
<option value="Zurich">Zurich</option>
</select></td> <td align="left">Return Date :</td> <td><input type="text" name="returnDate" value="05/08/2014" size="10" maxlength="10" /> 
<!-- Return Date Applet -->

The content I wish to capture is from <select name="depart" > to </select></td>

The regular expression I attempted was;

\Q<td><select name=\E"(.*\r\n)*(\Q</select></td>\E?)

But unfortunately, it captures up to the last </select></td> even though I have specified a non-greedy "?" within the third argument: (\Q</select></td>\E?)

Could anyone kindly alert me to my mistake, and possibly align me to a solution?

As an extension, what would be the way to say "only the second occurrence onwards"? So starting from the second <select name=".*> .

Cheers!!

The answer to my problem was to use <td><select name="(.*\r\n)*?(</select></td>) in case someone else wanted to know.

Thanks MikeH-R!

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
Chazara
  • 159
  • 2
  • 3
  • 16
  • 4
    Please read http://stackoverflow.com/a/1732454/1256925 – Joeytje50 May 06 '14 at 11:58
  • 1
    Regex and HTML? Prepare for the onslaught :) – Umair May 06 '14 at 11:58
  • In what language/environment are you doing this? – Bergi May 06 '14 at 11:59
  • First off Joeytje50's comment is more apt. secondly, The ? should be after the * to form `(.*\r\n)*?` but please use an html parser instead. – Mike H-R May 06 '14 at 12:02
  • @Joeytje50 & @MikeH-R; I am using **LoadRunner Web HTTP Protocol** which has a function called **web_reg_save_param_regexp** that is designed for Left and Right boundary capture of HTML code, using regex. I am writing this in LoadRunner (Language: C) but uses plain text regex within the function stated above. – Chazara May 06 '14 at 12:20
  • @MikeH-R As you said, the "?" was in the wrong place. I used; ` – Chazara May 06 '14 at 12:25
  • Just to clerify things for those of you not using LoadRunner and sending @Chazara to that other post. In LoadRunner you sometimes need to isolate a section of the server response not to parse it! Chazara doesn't want to parse the HTML but to retrieve something between two constant strings. The fact that the response is HTML is just a coincidence and it can be of any protocol. – Buzzy May 06 '14 at 16:23

1 Answers1

0

I'm reposting as an answer since you said the comment solved your problem, but I need to reiterate Joeytje50's comment first don't parse html with regex's.

Now that we've got that out of the way and you promise to only use this for educational purposes and never ever in production; here's the solution, you had the ? in the wrong place, you wanted to turn the * from greedy into non-greedy:

\Q<td><select name=\E"(.*\r\n)*?(\Q</select></td>\E)
Community
  • 1
  • 1
Mike H-R
  • 7,726
  • 5
  • 43
  • 65
  • 1
    Mike, I will be using regex with html parsing for the simple reason of: I have to. I am using LoadRunner, which is a performance testing tool, that has the commonly accepted function to use regexp to capture specific content (in most cases, for correlative data) from a web page. LoadRunner's primary function is **to** capture and parse HTML. If you care to know more about regex with LoadRunner, you could look at; http://www.jds.net.au/tech-tips/correlation-with-regex/ Now that we've got that out of the way ;), I'll mark your answer. Cheers again matey! I appreciate your effort – Chazara May 06 '14 at 12:36