0

I have a text in PHP stored in the variable $row. I'd like to find the position of a certain group of words and that's quite easy. What's not so easy is to make my code recognize that the word it has found is exactly the word i'm looking for or a part of a larger word. Is there a way to do it?

Example of what I'd like to obtain

CODE:

$row= "some ugly text of some kind i'd like to find in someway"
$token= "some";
$pos= -1;
$counter= substr_count($row, $token);
for ($h=0; $h<$counter; $h++) {
     $pos= strpos($row, $token, $pos+1);
     echo $pos.' ';
}

OUTPUT:

what I obtain:

0 17 47

what I'd like to obtain

0 17

Any hint?

Jannuzzo
  • 169
  • 1
  • 4
  • 12
  • You mean 0, 18, 48 :) Have you tried regexp with word boundaries? – Max Mar 12 '14 at 12:06
  • try giving `$token = " some ";` (i.e. space before and after your token) if you want the position of that word only... Hope I got the question correctly... if not then please try to elaborate – sumitb.mdi Mar 12 '14 at 12:08
  • @sumitb.mdi this could work almost perfectly.. but what if the token is at the start or at the end of the string? – Jannuzzo Mar 12 '14 at 12:13

3 Answers3

3

Use preg_match_all() with word boundaries (\b):

$search = preg_quote($token, '/');
preg_match_all("/\b$search\b/", $row, $m, PREG_OFFSET_CAPTURE);

Here, the preg_quote() statement is used to correctly escape the user input so as to use it in our regular expression. Some characters have special meaning in regular expression language — without proper escaping, those characters will lose their "special meaning" and your regex might not work as intended.

In the preg_match_all() statement, we are supplying the following regex:

/\b$search\b/

Explanation:

  • / - starting delimiter
  • \b - word boundary. A word boundary, in most regex dialects, is a position between a word character (\w) and a non-word character (\W).
  • $search - escaped search term
  • \b - word boundary
  • / - ending delimiter

In simple English, it means: find all the occurrences of the given word some.

Note that we're also using PREG_OFFSET_CAPTURE flag here. If this flag is passed, for every occurring match the appendant string offset will also be returned. See the documentation for more information.

To obtain the results you want, you can simply loop through the $m array and extract the offsets:

$result = implode(' ', array_map(function($arr) {
    return $arr[1];
}, $m[0]));

echo $result;

Output:

0 18

Demo

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
  • The answer seeker wishes `0 17` as output.. Can you please suggest how can he get that from your code? – Tzar Mar 12 '14 at 12:20
  • @Tzar: The output they're hoping to get is probably wrong. `some ugly text of s` — I see **`18` characters** before the second `s`. Maybe it was a counting mistake in the original question? – Amal Murali Mar 12 '14 at 12:22
  • 1
    You're missing the point my friend.. Am talking about the formatting & display.. Needs position numbers separated by spaces.. – Tzar Mar 12 '14 at 12:33
  • @Tzar: I thought that was easy. Anyway I've updated the answer to include an explanation. Thanks for the heads-up. – Amal Murali Mar 12 '14 at 12:43
2

What you're looking for is a combination of Regex with a word boundaries pattern and the flag to return the offset (PREG_OFFSET_CAPTURE).

PREG_OFFSET_CAPTURE

If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the value of matches into an array where every element is an array consisting of the matched string at offset 0 and its string offset into subject at offset 1.

$row= "some ugly text of some kind i'd like to find in someway";
$pattern= "/\bsome\b/i";
preg_match_all($pattern, $row, $matches, PREG_OFFSET_CAPTURE);

And we get something like this:

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [0] => some
                    [1] => 0
                )
            [1] => Array
                (
                    [0] => some
                    [1] => 18
                )
        )
)

And just loop through the matches and extract the offset where the needle was found in the haystack.

// store the positions of the match
$offsets = array();
foreach($matches[0] as $match) {
    $offsets[] = $match[1];
}

// display the offsets
echo implode(' ', $offsets);
Max
  • 6,563
  • 4
  • 25
  • 33
-1

Use preg_match():

if(preg_match("/some/", $row))
// [..]

The first argument is a regex, which can match virtually anything you want to match. But, there are dire warnings about using it to match things like HTML.

Community
  • 1
  • 1
rm-vanda
  • 3,122
  • 3
  • 23
  • 34
  • Don't really think this will solve the OP's problem, but I've edited the answer to "fix" the code. And removed my downvote :) – Amal Murali Mar 12 '14 at 19:40
  • You're right, the selected answer is much better - and besides, I misread the question. But thank you - – rm-vanda Mar 12 '14 at 20:05