I am attempting to regex the end of the twitter link where the only identifier is the class value fl. Thus, the regex (to the best of my knowledge) must include:
class=\"fl\"
account for changing middle section, where \S+ does not work, then find and group on:
data-href="http://www.twitter.com/(newyorklife)
where the group is found in the parentheses. The whole string I am trying to parse through is.
<g-link class="fl"><a href="/url?sa=t&rct=j&q=&esrc=s&source=web&cd=32&cad=rja&uact=8&ved=0ahUKEwjknIy87oHWAhXHi1QKHXQdAJsQ9zAIyQEwHw&url=http%3A%2F%2Fwww.twitter.com%2Fnewyorklife&usg=AFQjCNHKcAcw6H6cYG3YH1j4V3UOxX1whw" onmousedown="return rwt(this,'','','','32','AFQjCNHKcAcw6H6cYG3YH1j4V3UOxX1whw','','0ahUKEwjknIy87oHWAhXHi1QKHXQdAJsQ9zAIyQEwHw','','',event)" data-href="http://www.twitter.com/newyorklife"><div jsl="$t t-XNwoAoU5dyo;$x 0;" class="r-iBA3fWkVHWLE"><g-img class="_tek"><img id="uid_4" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAIAAAD8GO2jAAABZUlEQVR4AWLQWfWQpmjUAjxo1IJRC2wBpJTDQkVRFIafKBujZrnGjbNd84wHiJNs27btrm3rrFbW1T73m20u/yMsW0cBg6zue5XCYLFQcC41JK0I3PsYaWvC+BkugYFljrbmWPp/H/86FOnhB2hGZbTg/dBhFoEBhsoEAO23Su9+5s/9nA0R/ANtXEgNJTtiAgObfB28gZaKt8Wen2ZarhRgjVL8nagGmetC+IFMb5lgqOtOZAtsLVgjcIhFZqD+RLYj0IFzGCwUcRctc7XgNNcyA7GBhAW+EWvnHK3XCjqDhg3OUpvAEegFTgAdA+nrwnuF4zCw7DSlwqOPscRxUAmtiYqY5NDXImz/6mPprlAP1sDgcjdFLokdCkPGW6Kstmbhtoim2IWNsRsvFXNsjURvBmvgiMROc11S0+BhVvmhFAUDhewrISgbg4/qlyUdeEnl+sBk7SOgfcBSb3jWaKMWjFoAABKespvtvzYlAAAAAElFTkSuQmCC" data-deferred="1" class="_WCg" height="32" width="32" alt="" onload="typeof google==='object'&&google.aft&&google.aft(this)"></g-img></div>Twitter</a></g-link>
I am not aware if regex has a method or can skip the entire middle section with so many special chars. I have been playing at pythex.org for a while and can't find a method which simply finds an initial value then skips everything until ... specified values. Any ideas?
Edit. I want the string 'Newyorklife' as the output. Though this is a changing value, so really i just want the \w+ which comes after twitter.com/. The issue is that the class=fl is the only unique identifier for the line on the webpage (as twitter and data-href show up elsewhere on the page).