Regular expression - ignore first 7 character from extracted data

Question

Extracting data from a page source. In the extracted data, need to display text after the ". Tried different options. Didn't work. Any suggestions Page source text enter image description here

input type name=loginForm_SUBMIT value="1" /input type=""name="faces.ViewState" id="faces.ViewState" value="9uiY/UWJ1/w3PQ==" /><

regular expression: value="[^"1" ].*\w== Output: value="9uiY/UWJ1/w3PQ== Expected Output: 9uiY/UWJ1/w3PQ==

Recommended reading: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — nicael, Apr 01 '18 at 14:33
Can your language use capture groups? F.e. `value="([A-Za-z0-9\/]*==)"` then get capture group $1. And btw, for what language or regex engine is this? F.e. in the PCRE regex engine you can use \K, but not in the simple regex engine used in javascript. — LukStorms, Apr 01 '18 at 14:49
If you insist to keep your version, thanks to use code blocks `{}` for code and did you see my full featured answer ? — Gilles Quénot, Apr 01 '18 at 14:51
When you have text output, [don't take a picture but copy paste the output in your POST](https://unix.meta.stackexchange.com/questions/4086/psa-please-dont-post-images-of-text) The html can be copied as well with right click -> copy as outerHTML. — Gilles Quénot, Apr 01 '18 at 14:54
Thx Gilles, Niceal and LukStorms. Links and recommendations were helpful — Tester, Apr 02 '18 at 00:32

score 0 · Answer 1 · answered Apr 01 '18 at 14:38

Don't parse XML/HTML with regex, use a proper XML/HTML parser and a powerful xpath query.

theory :

According to the compiling theory, XML/HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of XML/HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.

realLife©®™ everyday tool in a shell :

You can use one of the following :

xmllint often installed by default with libxml2, xpath1 (check my wrapper to have newlines delimited output

xmlstarlet can edit, select, transform... Not installed by default, xpath1

xpath installed via perl's module XML::XPath, xpath1

xidel xpath3

saxon-lint my own project, wrapper over @Michael Kay's Saxon-HE Java library, xpath3

or you can use high level languages and proper libs, I think of :

python's lxml (from lxml import etree)

perl's XML::LibXML, XML::XPath, XML::Twig::XPath, HTML::TreeBuilder::XPath

ruby nokogiri, check this example

php DOMXpath, check this example

Check: Using regular expressions with HTML tags

Example using xpath :

xmllint --html --xpath 'string(//input[@value][2]/@value)' file

Output :

9uiY/UWJ1/w3PQ==

Regular Expression Extractor in JMeter does not support xpath. I am testing this in https://regexr.com/. xpath not supported there as well — Tester, Apr 01 '18 at 15:32
Feel free to use a bazooka to kill a fly. You never add JMeter flag anywhere — Gilles Quénot, Apr 01 '18 at 15:41

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

0

You may try this

(?:value[^v]*value=\")([^\"]*)

The output you want is captured in group 1, and you can retrieve it by backreference \1 or $1. Demo

"value=" is occurred twice in your sample text, so you seemed use the regex(value="[^"1" ].*\w==) to avoid the first one and match second one.

But the regex is wrong because character class'[...]' means one character. If the character class is followed by the quantifier(repeater) *, +, or {min,max} etc, then it's possible the regex means the string which has multiple characters.

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 01 '18 at 15:25

Thm Lee

1,236
1
9
12

Thx Thm. Recommendation worked with a little tweek. Its working and grabbing the exact text. – Tester Apr 02 '18 at 00:31