1

I would like to check if the preceding character of a search pattern is an alphanumeric character.

If true, do nothing.

If fasle, remove the preceding space in the search pattern.

For example:

$string1 = "This is a test XYZ something else";

$string2 = "This is a test? XYZ something else";

$pattern = " XYZ";

In $string1 scenario, the preceding character of the search pattern is t and considered a match, nothing will be perform.

In $string2 scenario, the preceding character of the search pattern is ? and considered a non-match, and I'm removing the extra space in searhc pattern.

Making it:

$string2 = "This is a test?XYZ something else";

How can this be accomplished in PHP?

KDX
  • 611
  • 2
  • 10
  • 22

1 Answers1

4

You may use a \B XYZ pattern and use a preg_replace_callback to trim the match value and insert it back:

$string1 = "This is a test XYZ something else";
$string2 = "This is a test? XYZ something else";
$pattern = " XYZ";
echo preg_replace_callback('~\B'.$pattern.'~', function($m) { return trim($m[0]); }, $string1) . PHP_EOL;
// => This is a test XYZ something else
echo preg_replace_callback('~\B'.$pattern.'~', function($m) { return trim($m[0]); }, $string2);
// => This is a test?XYZ something else

See the PHP demo

Since \B matches at the locations other than those matched with a word boundary (a non-word boundary), the pattern \B XYZ will only match after a non-word char.

More details: your pattern starts with a space. This is a non word char. By adding \B before it we require that the character before the space should also be a non word char. Else, we'll get no match. The word char is a char from [a-zA-Z0-9_] range. If you need to customize the boundary, use a lookbehind like (?<![a-zA-Z0-9]) to exclude the underscore from the boundary characters.

For more information on non-word boundary see this What are non-word boundary in regex (\B), compared to word-boundary? SO thread.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    If you need to be more specific than any non-word-boundary, you could use a look-behind, such as `(?<=[a-zA-Z0-9]) XYZ` – Steven Doggart Jul 14 '16 at 15:05
  • @Wiktor-Stribiżew The solution works yet I'm having a hard time understanding it. Only difference I see is `PHP_EOL` and how does it have to do with alphanumeric character? – KDX Jul 14 '16 at 15:16
  • @Steven-Doggart Your proposed idea is interesting, would you mind to elaborate it with a sample answer? It looks like giving me more control of what to match, and may better suit my project with international languages. – KDX Jul 14 '16 at 15:19
  • 2
    Note that \B is equal to `(?<!\w)` here. Yes you may unwrap it and customize further. See https://ideone.com/gvWNNL. If you want to treat an underscore as a special char you can use `(?<![^\W_])` to exclude it from `\w`. – Wiktor Stribiżew Jul 14 '16 at 15:22
  • A `PHP_EOL` only inserts a line break in between demo results, it has nothing to do with the solution I suggest. – Wiktor Stribiżew Jul 14 '16 at 15:32
  • @KDX There's really nothing to show. I was just suggesting an alternative pattern which would give you some extra control, if need be. The only difference would be the pattern string. None of the rest of the code would be affected. – Steven Doggart Jul 14 '16 at 15:34