0

Problem: authors have added email addresses wrongly in a CMS - missing out the 'mailto:' text.

I need a regular expression, if possible, to do a search and replace on the stored MySQL content table.

Cases I need to cope with are:

  1. No 'mailto:'
  2. 'mailto:' is already included (correct)
  3. web address not email - no replace
  4. multiple mailto: required (more than one in string)

Sample string would be: (line breaks added for readability)

<a href="add1@test.com">add1@test.com</a> and
<a href="mailto:add2@test.com">add2@test.com</a> and
<a href="http://www.test.com/">real web link</a>
second one to replace <a href="add3@test.com">add3@test.com</a>

Required output would be:

<a href="mailto:add1@test.com">add1@test.com</a> and
<a href="mailto:add2@test.com">add2@test.com</a> and
<a href="http://www.test.com/">real web link</a>
second one to replace <a href="mailto:add3@test.com">add3@test.com</a>

What I tried (in PHP) and issues:

pattern:   /href="(.+?)(@)(.+?)(<\/a> )/iU
replacement:    href="mailto:$1$2$3$4

This is adding mailto: to the correctly formatted mailto: and acting greedily over the last two links.

Thanks for any help. I have looked about, but am running out of time on this as it was an unexpected content issue.

If you are able to save me time and give the SQL expression, that would be even better.

Kerem
  • 11,377
  • 5
  • 59
  • 58
mjpg
  • 23
  • 5
  • MySQL does not have REGEXP replace built in. There are user-defined functions, but it looks like a dump followed by off-line processing is the best option - see: [Stack Overflow discussion how-to-do-a-regular-expression-replace-in-mysql](http://stackoverflow.com/questions/986826/how-to-do-a-regular-expression-replace-in-mysql) – mjpg Feb 13 '13 at 13:38

3 Answers3

1

Try replace

/href="(?!(mailto:|http:\/\/|www\.))/iU

with

href="mailto:

?! loosely means "the next characters aren't these".

Alternative:

Replace

/(href=")(?!mailto:)([^"]+@)/iU

with

$1mailto:$2

[^"]+ means 1 or more characters that aren't ".

You'd probably need a more complex matching pattern for guaranteed correctness.

MySQL REGEX matching:

See this or this.

Community
  • 1
  • 1
Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
  • First one works OK in PHP but needs escaping: `/href="(?!(mailto:|http:\/\/|www\.))/iU';` – mjpg Feb 13 '13 at 13:27
  • Second one gives duplicate where the mailto already exists: `mailto:mailto:` Many thanks for this - I'll try first in SQL – mjpg Feb 13 '13 at 13:29
  • As stated above MySQL does not have REGEXP replace built in, but this was the code I used off-line in an SQL dump file. Thanks @Dukeling also for the pointers to MySQL. Overall problem solved! – mjpg Feb 13 '13 at 14:57
1

You need to apply a proper mail pattern first (e.g: Using a regular expression to validate an email address), second search for mailto:before mail or nothing (e.g: (mailto:|)), and last preg_replace_callback suits for this.

This looks like working as you wish (searching only email addresses in double quotes);

$s = '<a href="add1@test.com">add1@test.com</a> and 
<a href="mailto:add2@test.com">add2@test.com</a> and 
<a href="http://www.test.com/">real web link</a> 
second one to replace <a href="add3@test.com">add3@test.com</a>';
echo preg_replace_callback(
    '~"(mailto:|)([_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4}))"~i', 
    function($m) {
        // print_r($m); @debug
        return '"mailto:'. $m[2] .'"';
    },
    $s
);

Output as you desired;

<a href="mailto:add1@test.com">add1@test.com</a> and 
<a href="mailto:add2@test.com">add2@test.com</a> and 
<a href="http://www.test.com/">real web link</a> 
second one to replace <a href="mailto:add3@test.com">add3@test.com</a>
Community
  • 1
  • 1
Kerem
  • 11,377
  • 5
  • 59
  • 58
  • Many thanks. This worked on my test except I have PHP 5.2 so I had to use a normal function: ` function cbfunc($m) { return '"mailto:'. $m[2] .'"'; } $newstr = preg_replace_callback( '~"(mailto:|)([_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4}))"~i', cbfunc, $s ); ` However I needed SQL, so preg_replace_callback() not available. – mjpg Feb 13 '13 at 13:18
0

Use the following as pattern:

/(href=")(?!mailto:)(.+?@.+?")/iU

and replace it with

$1mailto:$2

(?!mailto:) is a negative lookahead checking whether a mailto: follows. If there is no such one, remaining part is checked for matching. (.+?@.+?") matches one or more characters followed by a @ followed by one or more characters followed by a ". Both + are non-greedy.

The matched pattern is replaced with first capture group (href=") followed by mailto: followed by second capture group (upto closing ").

Naveed S
  • 5,106
  • 4
  • 34
  • 52