1

I have the following code to get characters before/after the regex match:

$searchterm = 'blue';
$string = 'Here is a sentence talking about blue.  This sentence talks about red.';
$regex = '/.*(.{10}\b' . $searchterm . '\b.{10}).*/si';
echo preg_replace($regex, '$1', $string);

Output: "ing about blue. This se" (expected).

When I change $searchterm = 'red', then I get this:

Output: "Here is a sentence talking about blue. This sentence talks about red."

I am expecting this: "lks about red." The same thing happens if you start at the beginning of the sentence. Is there a way to use a similar regex to not pull back the entire string when it's at the start/end?

Example of what is happening: https://sandbox.onlinephpfunctions.com/code/e500b505860ded429e78869f61dbf4128ff368b3

user2096091
  • 105
  • 7
  • 5
    Not a correct dupe as OP already knows how to match 10 characters. – anubhava Feb 26 '21 at 17:39
  • Well, even if it is not the exact dupe, it is exactly the missing bit. Also, since the answer not following best practices is accepted, this post should be removed. – Wiktor Stribiżew Mar 14 '21 at 20:46
  • 4
    Thanks for accepting that dupe wasn't correct. And it is prerogative of OP to chose whatever answer works for him/her. We have thousands of answers on SO that have better answers than the accepted one. – anubhava Mar 15 '21 at 06:39

2 Answers2

6

Converting my comment to answer so that solution is easy to find for future visitors.

You regex regex is almost correct but make sure to use a non-greedy quantifier with .{0,10} limit for surrounding substring:

$searchterm = 'blue';
$string = 'Here is a sentence talking about blue.  This sentence talks about red.';
$regex = '/.*?(.{0,10}\b' . $searchterm . '\b.{0,10}).*/si';
echo preg_replace($regex, '$1', $string);

Updated Code Demo

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
2

You'd better use preg_match with .{0,10} quantifiers instead of {10},

function truncateString($searchterm){
    $string = 'Here is a sentence talking about blue.  This sentence talks about red.';
    $regex = '/.{0,10}\b' . $searchterm . '\b.{0,10}/si';
    if (preg_match($regex, $string, $m)) {
        echo $m[0] . "\n";
    }  
}

truncateString('blue');
// => ing about blue.  This se
truncateString('red');
// => lks about red.

See the PHP demo.

preg_match will find and return the first match only. The .{0,10} pattern will match zero to ten occurrences of any char (since the s modifier is used, the . matches even line break chars).

One more thing: if your $searchterm can contain special regex metacharacters, anywhere in the term, you should consider refactoring the code to

$regex = '/.{0,10}(?<!\w)' . preg_quote($searchterm, '/') . '(?!\w).{0,10}/si';

where (?<!\w) / (?!\w) are unambiguous word boundaries and the preg_quote is used to escape all special chars.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Please do not extend your question via comment. Take a moment to understand these answer, then try to adjust them yourself. After some time, toil, and research, then ask a new question if you are stuck. – mickmackusa Feb 26 '21 at 17:12
  • I couldn't figure it out and figured it would be better if it were self contained in this question than posting a new one. – user2096091 Feb 26 '21 at 17:15
  • 1
    Stack Overflow pages are not forum threads that read chronologically like a top-to-bottom conversation. There must be only one question on a page (and that question should be 100% in the question body at the top of the page. There is nothing wrong with asking two questions that are related -- they just need to be different. Please do not get stuck in a cycle of help vampirism. If you are a developer, please act like one and put in real effort between questions. – mickmackusa Feb 26 '21 at 17:27
  • @user2096091 I think you are asking about using `preg_quote` within an `array_map`, see [this answer of mine](https://stackoverflow.com/questions/59032816/exact-regex-match-with-ampersand-words/59033014#59033014). You would just need to change `~` to `/` as you are using `/` as a regex delimiter. I'd really urge you to use the right method for the current task, and it is `preg_match`. With `.*` or `.*?` at the start, you end up with a lot of redundant backtracking that you do not get with `preg_match` pattern (note the `.*` at the end of a regex pattern has no impact on matching performance). – Wiktor Stribiżew Feb 26 '21 at 17:37