2

I'm trying to translate my javascript which uses a javascript replace function into PHP. The js replace has a callback that uses the offset and source string values. I tried using preg_replace_callback the js replace function callback captures the offset value but PHP does not provide this.

Javascript function below:

log.replace(/(?:<del>(.|\n)*?<\/del>)|(?:<ins>(.|\n)*?<\/ins>)/g, 
 function(match, p1, p2, offsetval, strval) {
  //does something with the offsetval and strval
 });

Is there any easy way to do this with preg_replace_callback or preg_match with callback? It's really just matching rather than replacing.

The issue is preg_match_all supports offset capturing but not callbacks and preg_replace_callback supports callbacks but not offsets!!!

I found this function on github https://gist.github.com/hakre/5376227

Any Simpler way?

xmxmxmx
  • 409
  • 1
  • 6
  • 16
  • You'd better start off by providing your replace functionality in JS. – revo Jun 17 '18 at 08:35
  • it probably won't help but I've included the replace function which I'm trying to convert into php. – xmxmxmx Jun 17 '18 at 09:19
  • It will. Please be more specific. Put the real regex there. – revo Jun 17 '18 at 09:20
  • ok done, still don't think it'll help much as PHP doesn't have offset vals in preg_replace_callback, the regex scans this sort of text "a horsecat jumps" and I need to know the offset of the match to find the position of the text inside the ins or del command. – xmxmxmx Jun 17 '18 at 09:22

3 Answers3

3

Unfortunately we don't have any arguments which tracks offsets in preg_replace_callback but there is a chance to have it. I modified your own regex to a better performing regex then added to the other side of alternation this one: (?P<DOT>[\s\S]). This regex matches one single character at a time if earlier side of alternation doesn't match. More precisely it takes one step forward if desired regex doesn't match to retain offset.

$str = "The color is <del>blue</del> or <ins>red!</ins>";
$offset = 0;
preg_replace_callback('/<(del|ins)>[\s\S]*?<\/\1>|(?P<DOT>[\s\S])/',
    function($m) use (&$offset) {
        //...
        $offset += strlen($m[0]); // $m[0] contains at least of character
    },
    $str
);

If I do echo (echo $offset, "|", $m[0], "\n";) right before $offset line, we would have this output:

0|T
1|h
2|e
3| 
4|c
5|o
6|l
7|o
8|r
9| 
10|i
11|s
12| 
13|<del>blue</del>
28| 
29|o
30|r
31| 
32|<ins>red!</ins>
revo
  • 47,783
  • 14
  • 74
  • 117
1

For those who are looking how to solve this. I ended with using strpos:

$string = 'some text';
$position = 0;
$callback = function (array $match) use ($string, &$position) {
    $offset = strpos($string, $match[0], $position);
    $position = $offset + strlen($match[0]);
    // do your stuff
    return 'replacement';
};
preg_replace_callback('/regex/', $callback, $string);

It will give you the same offset as other preg_* functions. Which is in bytes. Keep this in mind working with multibyte character sets.

vstelmakh
  • 742
  • 1
  • 11
  • 19
0

You can use lightweight T-Regx library which has offset() and byteOffset() methods

pattern('(?:<del>(.|\n)*?<\/del>)')->replace($s)->first()->callback(function (Match $m) {

    $match->offset();       // offset in characters
    $match->byteOffset();   // offset in bytest

});

You can read more about them here: https://t-regx.com/docs/match-offsets

Danon
  • 2,771
  • 27
  • 37