-3

Extracting data from a page source. In the extracted data, need to display text after the ". Tried different options. Didn't work. Any suggestions Page source text enter image description here

input type name=loginForm_SUBMIT value="1" /input type=""name="faces.ViewState" id="faces.ViewState" value="9uiY/UWJ1/w3PQ==" /><

regular expression: value="[^"1" ].*\w== Output: value="9uiY/UWJ1/w3PQ== Expected Output: 9uiY/UWJ1/w3PQ==

Tester
  • 1
  • 1
  • Recommended reading: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – nicael Apr 01 '18 at 14:33
  • Can your language use capture groups? F.e. `value="([A-Za-z0-9\/]*==)"` then get capture group $1. And btw, for what language or regex engine is this? F.e. in the PCRE regex engine you can use \K, but not in the simple regex engine used in javascript. – LukStorms Apr 01 '18 at 14:49
  • If you insist to keep your version, thanks to use code blocks `{}` for code and did you see my full featured answer ? – Gilles Quénot Apr 01 '18 at 14:51
  • When you have text output, [don't take a picture but copy paste the output in your POST](https://unix.meta.stackexchange.com/questions/4086/psa-please-dont-post-images-of-text) The html can be copied as well with right click -> copy as outerHTML. – Gilles Quénot Apr 01 '18 at 14:54
  • Thx Gilles, Niceal and LukStorms. Links and recommendations were helpful – Tester Apr 02 '18 at 00:32

2 Answers2

0

Don't parse XML/HTML with regex, use a proper XML/HTML parser and a powerful query.

theory :

According to the compiling theory, XML/HTML can't be parsed using regex based on finite state machine. Due to hierarchical construction of XML/HTML you need to use a pushdown automaton and manipulate LALR grammar using tool like YACC.

realLife©®™ everyday tool in a :

You can use one of the following :

xmllint often installed by default with libxml2, xpath1 (check my wrapper to have newlines delimited output

xmlstarlet can edit, select, transform... Not installed by default, xpath1

xpath installed via perl's module XML::XPath, xpath1

xidel xpath3

saxon-lint my own project, wrapper over @Michael Kay's Saxon-HE Java library, xpath3

or you can use high level languages and proper libs, I think of :

's lxml (from lxml import etree)

's XML::LibXML, XML::XPath, XML::Twig::XPath, HTML::TreeBuilder::XPath

, check this example

DOMXpath, check this example


Check: Using regular expressions with HTML tags


Example using :

xmllint --html --xpath 'string(//input[@value][2]/@value)' file

Output :

9uiY/UWJ1/w3PQ==
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
0

You may try this

(?:value[^v]*value=\")([^\"]*)

The output you want is captured in group 1, and you can retrieve it by backreference \1 or $1. Demo

"value=" is occurred twice in your sample text, so you seemed use the regex(value="[^"1" ].*\w==) to avoid the first one and match second one.

But the regex is wrong because character class'[...]' means one character. If the character class is followed by the quantifier(repeater) *, +, or {min,max} etc, then it's possible the regex means the string which has multiple characters.

Community
  • 1
  • 1
Thm Lee
  • 1,236
  • 1
  • 9
  • 12
  • Thx Thm. Recommendation worked with a little tweek. Its working and grabbing the exact text. – Tester Apr 02 '18 at 00:31