I want to retrieve content from a webpage using JMeter.
The data I'm looking for is inside a javascript block :
(...)
<map id="id1">
<script type="text/javascript">
var name="Lionel Richie";
var song="Hello";
var lyrics="Is it me you're looking for ?";
</script>
(...)
<script type="text/javascript">
var name="Waldo";
</script>
</map>
(...)
Let's say I want the value of the name
variable inside a script block in the map id=id1,
where there's also a song
variable.
I use a XPath Extractor to get the script content (CSS/Jquery won't get the javascript content as it's not pure HTML) :
.//map[@id='id1']/script[contains(.,'song')]
XPath won't find the data because my HTML is dirty (some wild stuff with missing tag ends and so on...) so I need to clean it up using Jtidy (Use "Tidy(tolerant parser)" option)
Remarks :
- I do not own the webpage I'm processing. I have to deal with this hideous HTML.
- there are many maps
elements in the webpage each of them having a script with a song
variable : I can't directly use regexp (as far as I know)
Problem :
The problem is : my HTML contains weird international characters wé hà bêêêê... (yep, french, sorry about that) and Jtidy doesn't handle properly this particular case : bug #205 StringIndexOutOfBoundsException while lexing script content
As a result Xpath extractor fails and my entire test plan is stuck.
I desgined a custom solution but I find it a bit complex. Maybe I can handle this in a better way.
My solution :
I used tagsoup java library to clean HTML output and store it in a JMeter variable that is then processed through Xpath (tick "JMeter variable" option in "Apply to") and finally I used a regexp to get my Lionel Richie stuff working...
JMeter
|->HTTP Request
|->BeanShell PostProcessor->tagsoup > var RESPONSE
|->Xpath Extractor, Apply to var RESPONSE > var XPATH_OUTPUT
|->Regular Expression Extractor, Apply to var XPATH_OUTPUT
To get tagsoup working with JMeter, just put the jar in the lib directory, and then use a BeanShell PostProcessor.
BeanShell code used :
import org.xml.sax.*;
import org.ccil.cowan.tagsoup.*;
// getting response data of previous sampler
String rep=prev.getResponseDataAsString();
XMLReader r = new Parser();
HTMLSchema theSchema = new HTMLSchema();
r.setProperty(Parser.schemaProperty, theSchema);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
Writer w = new OutputStreamWriter(outStream);
XMLWriter x = new XMLWriter(w);
x.setPrefix(theSchema.getURI(), "");
r.setContentHandler(x);
r.parse(new InputSource(new StringReader(rep)) );
String encodedRep=outStream.toString("UTF-8");
vars.put("RESPONSE", encodedRep);