0

I have a script that downloads the latest newsletter from a group inbox on a spare touchscreen in our office. It works fine, but people keep accidentally unsubscribing us so I want to hide the unsubscribe link from the email.

$preg_replace seems like it would work because I can set up a pattern that simply removes any link withthe word "unsubscribe" in. I validated the pattern below using the tool at http://regex101.com/ , and it even picks up variations like "manage subscription" as well. It is ok if the odd legitimate link with the word subscribe also get removed - there won't be many and it's only for internal use.

However, when I execute I get an error.

Here's my code:

line 53: $pat='<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>';

line 54: $themail[bodycontent]= preg_replace($pat, ' ',$themail[bodycontent]);

and I get this error:

preg_replace() [function.preg-replace]: Unknown modifier ']' in /home/trev/public_html/bigscreen/screen-functions.php on line 54

It must be something really simple like an unescaped char but I have gone code blind and can't for the life of me see it.

How do I get this pattern:

<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>

to run in a simple php script?

Thanks

Funk Forty Niner
  • 74,450
  • 15
  • 68
  • 141

5 Answers5

0

You have no delimiter. Or rather you do, but it's not the one you meant. PCRE is interpreting your first < as the opening delimiter (you can use matching brackets as delimiters - in fact, I use parentheses to help remind myself that the entire match is index 0). Then it sees the first > as the ending delimiter. Anything after that should be a modifier, but of course ] is not a modifier.

Wrap your regex with (...) to give it a proper set of delimiters.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • Thanks - that's a good explanation of the code snippet from Dale - I just need to sit down and absorb why it worked! – Steve Lownds Aug 15 '13 at 11:26
0

You haven't used any delimiters so it's treating the < character as the delimiter

Try something like this instead

$pat='#<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>#';
Dale
  • 10,384
  • 21
  • 34
  • Brilliant - that worked. I'll be honest - I don't fully know why, to my mind the starting and closing < > would be the keys to it all, but I kind of see why it needs something that isn't the actual content it is matching. – Steve Lownds Aug 15 '13 at 11:23
0

$themail[bodycontent] should be either $themail['bodycontent'] or $themail[$bodycontent].

It's trying to parse bodycontent] ... as the array index.

  • No it's not. See ["Why is `$foo[bar]` wrong?"](http://php.net/array). – Niet the Dark Absol Aug 15 '13 at 11:16
  • You're right that this is a mistake in his code and your suggested fix is correct, but this isn't the cause of the error he's asking about in the question and your analysis of why it's a bug doesn't seem to be quite right either. – Spudley Aug 15 '13 at 11:18
  • Of course, didn't think about constants. – MattLicense Aug 15 '13 at 11:20
  • To be honest I did simplify the code I published a bit to conceal my var names / file paths a bit, and the error you corrected was mainly me oversimplifying. However, I did wonder if it was more of a PHP issue than just regex matching so I did try various versions with and with out the ' in the var name, but I could easily have missed the right one. I have put the suggestions above in place to correct my expression, and also avoided this issue by putting the value into its own $variable to be used in the preg_replace, just to be safe. Thanks. – Steve Lownds Aug 15 '13 at 11:28
0

Patterns used in preg_match need to be enclosed by a pair of delimiter characters.

For example, a / or a ~ at the start and end of the string.

Anything outside of these delimiters at the end of the string is considered to be a regex "modifier".

Your example doesn't have delimiters, so PHP is wrongly assuming that the < character is the delimiter. It therefore sees the next < character as the closing delimiter, and therefore, anything after that as a modifier. Obviously all that stuff is supposed to be inside the pattern and isn't valid as modifiers, which is why PHP is complaining.

Solution: Add a pair of modifier characters:

$pat='~<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>~';
      ^                                                   ^
    add this                                         ...and this

(it doesn't have to be ~, you can choose your own modifier character to suit your needs. Best one to use is one that doesn't occur in your string (although you can escape it if it does)

Spudley
  • 166,037
  • 39
  • 233
  • 307
  • PHP is smarter than you give it credit for. If `<` is the opening delimiter, then `>` is the closing one. Same for `(...)`, `[...]` and `{...}`. All of these may be used as delimiters and do not require escaping of any characters inside the pattern since they come in matching pairs. – Niet the Dark Absol Aug 15 '13 at 11:18
  • @Kolink - I accept your point. I tend to avoid that kind of delimiter as it creates confusion when porting patterns between languages. The fact remains the same though; it's picking up a closing delimiter mid-string and treating the rest of the pattern as invalid modifiers. And I think his intention is to have the `<>` brackets as part of the pattern, so the solution remains the same too. – Spudley Aug 15 '13 at 11:24
0

Starting and ending of pattern with slash /

$pat='/<\s*(a|A)\s+[^>]*>[^<>]*ubscri[^<>]*<\s*\/(a|A)\s*>/';
Bora
  • 10,529
  • 5
  • 43
  • 73