0

I have to extract password from the email through regex as the xpath of the password location is dynamic.

Below is the xpath of the password location :

//*[@id=':k6']/div[1]/div/div[3]/div[3]/table/tbody/tr[3]/td

Sample password at this location :

QFYFV3WL$8H!

Here id is dynamic so 1st challenge is to generate regex for id. Secondly we need to generate regex to extract the password from password field. Each character of the password is dynamic and may contain any character.
Any help is appreciated here.

Navnath Godse
  • 2,233
  • 2
  • 23
  • 32
Rachit
  • 29
  • 2
  • 7
    We love challenges ! ♥ ... but what have you tried ? – hsz Jul 16 '13 at 11:40
  • Your question isn't exactly clear. What are the length limits? – Security Hound Jul 16 '13 at 11:45
  • 1
    I don't get it. If both the location in the xml and the password itself is unknown, then you can't extract it. Or do you have any additional criteria which you didn't tell us? – Bergi Jul 16 '13 at 12:01

1 Answers1

3

What you're trying to do can't be accomplished. There are cases where you can use a regex to parse an html document, but this is exactly the case where you can't: You need to navigate the DOM. A regex engine has no notion of document structure. HTML is not a regular language and so can't be parsed using regular expressions.

The cases where this is acceptable is when you're trying to parse something as if the html document was just a bunch of text. If you need to get inside tags, what you need is a DOM parser.

To quote a famous answer regarding this topic here on SO:

Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp

Community
  • 1
  • 1
bluehallu
  • 10,205
  • 9
  • 44
  • 61