1

I have an email that looks like this:

We’ve received a request to change your email address to example@thisexample.com.

To complete the process, please verify your email address by entering the following verification code.

86761G

This code is temporary and will expire in 30 minutes.

If this wasn’t requested by you, your account information will remain unchanged. No further action is required.

Warm regards, Example.com

I need to parse out the verification code: 86761G . Catch being that the code is dynamic, meaning it's ever changing. What IS static though is the layout of the email, so my thought would be to grab it by the new line index [2] (Even though it looks there's spaces in between it's the third <p> tag in the Div therefor the [2] index via new lines). Or my other idea was to do it via the HTML somehow (Don't really wanna use HTMLAgilityPack). The HTML is as follows for the Div:

<td colspan="2" style="padding:1.2em 45px 2em 45px;color:#000;font-   family:Corbel, 'Trebuchet MS', 'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:.875em;line-height:1.1em;">
<p>We’ve received a request to change your email address to example@thisexample.com.</p>
<p>To complete the process, please verify your email address by entering the following verification code.</p>
<p>86761G</p>
<p>This code is temporary and will expire in 30 minutes.</p>
<p>If this wasn’t requested by you, your account information will remain unchanged. No further action is required.</p>


<p>Warm regards,<br>
example.com</p>
</td>

Any idea how to parse this data out? I was thinking Regex if possible, even though I know that Regex isn't meant for HTML because it's not regular text. If I need HTMLAgilityPack I'll use it, if not though I prefer not. Thank you guys!

Oh side note - I'm using Firefox via Selenium, so there's always the option to use it's built in functions to grab it somehow?

Edit: I'm so stupid. Selenium - FindElementbyXPath (facepalm)

Community
  • 1
  • 1
Frank
  • 185
  • 13
  • 1
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – BradleyDotNET Feb 10 '15 at 00:31
  • "I was thinking Regex if possible, even though I know that Regex isn't meant for HTML because it's not regular text" – Frank Feb 10 '15 at 00:33
  • Read that a bit too late :) Still important to note. Not closing as a duplicate since you only thought that was a *potential* solution. – BradleyDotNET Feb 10 '15 at 00:36
  • Well I know that you're supposed to really, but I've parsed HTML with Regex before, so I know it's a POSSIBLE thing to do, just not recommended per say ;) – Frank Feb 10 '15 at 00:39
  • 1
    Regex can't match indefinetely nested elements, the best you can do is extract tokens from the HTML, which is enough in this case – Eduardo Wada Feb 10 '15 at 00:56

4 Answers4

1

Contrary to popular (and misinformed, imo) opinion, you can use Regular Expressions to extract this because the overarching structure of this document does, in fact, meet the requirements to be considered a Regular Grammar ( http://en.wikipedia.org/wiki/Chomsky_hierarchy )

Here's a regex I would use:

following verification code.</p>\s*<p>(\S+)</p>

Note the lack of any anchors (^$), it uses the known text "following verification code" to match just before the code. The verification code is then contained within the single regex group.

Dai
  • 141,631
  • 28
  • 261
  • 374
1

If you are using selenium, most likely the simplest way is to match it with the following css selector: p:nth-child(3)

Eduardo Wada
  • 2,606
  • 19
  • 31
0

Since you've mentioned only the verification code part is dynamic, I'm assuming whole markup structure won't change.

If this is true, you could use

<p>(.*?)<\/p>

This will capture <p> tags, 3rd captured group is your verification code.

Marko Gresak
  • 7,950
  • 5
  • 40
  • 46
0

You can use the following regular expression if the email is exactly the same all the time accept changing code:

(?<d>\<p\>[\S^\.]*</p\>)

if it is more complex you can do this:

(?<d>\<p\>.*</p\>)

which will find all paragraph lines and you can then iterate and find the code by elimination of constant strings like:

To complete the process, please verify your email address by entering the following verification code.

Eddy K
  • 216
  • 1
  • 7