0

I have a piece of html codes:

<tr style="padding:0;vertical-align:top;text-align:left"> 
                           <td style="word-break:break-word;border-collapse:collapse;padding:5px 10px;vertical-align:top;text-align:left;color:rgb(51,51,51);font-family:Helvetica,Arial,sans-serif;font-weight:bold;margin:0;line-height:19px;font-size:14px;width:270px;border-bottom:1px dotted rgb(212,212,212);border-left:none"> Traveler email </td> 
                           <td style="word-break:break-word;border-collapse:collapse;padding:5px 10px;vertical-align:top;text-align:left;color:rgb(51,51,51);font-family:Helvetica,Arial,sans-serif;font-weight:normal;margin:0;line-height:19px;font-size:14px;width:270px;border-bottom:1px dotted rgb(212,212,212)"> 
                            <div align="right"> 
                             <a href="mailto:anarky@gmail.com" style="color:rgb(42,110,187);text-decoration:none" target="_blank">anarky@gmail.com</a> 
                            </div> </td> 
                          </tr>

I want to grab the traveler email address. I just cannot regex directly to word mailto because there are some email addresses. So I think It's more be specific if I start regex from Traveler email.
This is the expression I've made:

/Traveler\semail+([^mailto:]+)/

But it doesn't work.
Please your advise guess, thank you.

Fatimah Wulandari
  • 307
  • 2
  • 5
  • 16

1 Answers1

0

Couple of issues in your approach, you need to use the "DOTALL" flag to allow the pattern to apply to the multiline snippet, and you are not capturing the actual email address after the "mailto". As other commenters pointed out you have some basic regex syntax issues in your pattern as well. Here is a small php file that does what I think you want and may be instructive. In the example "snippet.txt" is local to the php script and contains your sample html.

<?php
$myfile = fopen("snippet.txt", "r") or die("Unable to open file!");
$contents =  fread($myfile,filesize("snippet.txt"));
fclose($myfile);
$pattern = '/Traveler\s+email.*?mailto:(.*?)"/s';
preg_match($pattern,$contents,$matches);

print_r($matches);
?>

Running this on the command line with:

php -f thescript.php

you get the result:

Array
(
    [0] => Traveler email </td>
                           <td style="word-break:break-word;border-collapse:collapse;padding:5px 10px;vertical-align:top;text-align:left;color:rgb(51,51,51);font-family:Helvetica,Arial,sans-serif;font-weight:normal;margin:0;line-height:19px;font-size:14px;width:270px;border-bottom:1px dotted rgb(212,212,212)">
                        <div align="right">
                         <a href="mailto:anarky@gmail.com"
    [1] => anarky@gmail.com
)

The pattern:

$pattern = '/Traveler\s+email.*?mailto:(.*?)"/s';

sets the DOTALL flag with the "s" at the end, and uses the "lazy" syntax: ".*?" to only match all characters up to what follows the "?". Without the DOTALL, the ".*?" would not cross newlines, and you will get no match.

Max Mammel
  • 51
  • 2
  • hi, it could work. what is the last `s` character? thank you. – Fatimah Wulandari Oct 06 '16 at 05:38
  • The s is a "pattern modifier" that sets the "DOTALL" flag. See [here](http://php.net/manual/en/reference.pcre.pattern.modifiers.php) for more details. You can set multiple flags in this way. If you replace "s" with "si" for example, the pattern will be case-insensitive. – Max Mammel Oct 06 '16 at 05:42