75

I need to parse an HTML document and to find all occurrences of string asdf in it.

I currently have the HTML loaded into a string variable. I would just like the character position so I can loop through the list to return some data after the string.

The strpos function only returns the first occurrence. How about returning all of them?

Salman A
  • 262,204
  • 82
  • 430
  • 521
muncherelli
  • 2,887
  • 8
  • 39
  • 54

10 Answers10

106

Without using regex, something like this should work for returning the string positions:

$html = "dddasdfdddasdffff";
$needle = "asdf";
$lastPos = 0;
$positions = array();

while (($lastPos = strpos($html, $needle, $lastPos))!== false) {
    $positions[] = $lastPos;
    $lastPos = $lastPos + strlen($needle);
}

// Displays 3 and 10
foreach ($positions as $value) {
    echo $value ."<br />";
}
trejder
  • 17,148
  • 27
  • 124
  • 216
Adam Plocher
  • 13,994
  • 6
  • 46
  • 79
  • 14
    Please be careful using assignments in `if` statements. In this case, your `while` loop didn't work for position `0`. I've updated your answer. – Robbert Nov 06 '13 at 12:57
  • 9
    Excelent fix, but for those needing to find special characters (é, ë, ...) replace the strpos with mb_strpos, otherwise it won't work – Brentg Jun 22 '16 at 13:44
  • All of those who will reuse this code be careful because your needle may be something like "dd" in which case $lastPos should only increase by one inside the while loop. – assensi Aug 07 '20 at 09:52
24

You can call the strpos function repeatedly until a match is not found. You must specify the offset parameter.

Note: in the following example, the search continues from the next character instead of from the end of previous match. According to this function, aaaa contains three occurrences of the substring aa, not two.

function strpos_all($haystack, $needle) {
    $offset = 0;
    $allpos = array();
    while (($pos = strpos($haystack, $needle, $offset)) !== FALSE) {
        $offset   = $pos + 1;
        $allpos[] = $pos;
    }
    return $allpos;
}
print_r(strpos_all("aaa bbb aaa bbb aaa bbb", "aa"));

Output:

Array
(
    [0] => 0
    [1] => 1
    [2] => 8
    [3] => 9
    [4] => 16
    [5] => 17
)
Salman A
  • 262,204
  • 82
  • 430
  • 521
18

Its better to use substr_count . Check out on php.net

Nirmal Ram
  • 1,722
  • 4
  • 25
  • 45
  • 13
    this only gives you the count, not their positions as the question asked – DaveB Jul 07 '16 at 12:18
  • 2
    "This function doesn't count overlapped substrings." For string 'abababa' when you look 'aba' it will count only 2 times not 3 – R Picheta Sep 29 '16 at 21:27
4
function getocurence($chaine,$rechercher)
        {
            $lastPos = 0;
            $positions = array();
            while (($lastPos = strpos($chaine, $rechercher, $lastPos))!== false)
            {
                $positions[] = $lastPos;
                $lastPos = $lastPos + strlen($rechercher);
            }
            return $positions;
        }
Ryodo
  • 445
  • 5
  • 17
  • 3
    Code-only answers are low value on StackOverflow because they do very little to educate the OP and future readers. Please edit your answer with the intent to educate thousands of future SO readers and the OP. – mickmackusa Apr 01 '18 at 11:53
3

Use preg_match_all to find all occurrences.

preg_match_all('/(\$[a-z]+)/i', $str, $matches);

For further reference check this link.

trejder
  • 17,148
  • 27
  • 124
  • 216
웃웃웃웃웃
  • 11,829
  • 15
  • 59
  • 91
3

This can be done using strpos() function. The following code is implemented using for loop. This code is quite simple and pretty straight forward.

<?php

$str_test = "Hello World! welcome to php";

$count = 0;
$find = "o";
$positions = array();
for($i = 0; $i<strlen($str_test); $i++)
{
     $pos = strpos($str_test, $find, $count);
     if($pos == $count){
           $positions[] = $pos;
     }
     $count++;
}
foreach ($positions as $value) {
    echo '<br/>' .  $value . "<br />";
}

?>
Kach
  • 111
  • 1
  • 1
  • 7
2

Salman A has a good answer, but remember to make your code multibyte-safe. To get correct positions with UTF-8, use mb_strpos instead of strpos:

function strpos_all($haystack, $needle) {
    $offset = 0;
    $allpos = array();
    while (($pos = mb_strpos($haystack, $needle, $offset)) !== FALSE) {
        $offset   = $pos + 1;
        $allpos[] = $pos;
    }
    return $allpos;
}
print_r(strpos_all("aaa bbb aaa bbb aaa bbb", "aa"));
Umair Khan
  • 1,684
  • 18
  • 34
mangrove
  • 31
  • 2
1

Another solution is to use explode():

public static function allSubStrPos($str, $del)
{
    $searchArray = explode($del, $str);
    unset($searchArray[count($searchArray) - 1]);
    $positionsArray = [];
    $index = 0;
    foreach ($searchArray as $i => $s) {
        array_push($positionsArray, strlen($s) + $index);
        $index += strlen($s) + strlen($del);
    }
    return $positionsArray;
}
Amin.Qarabaqi
  • 661
  • 7
  • 19
0

Simple strpos_all() function.

function strpos_all($haystack, $needle_regex)
{
    preg_match_all('/' . $needle_regex . '/', $haystack, $matches, PREG_OFFSET_CAPTURE);
    return array_map(function ($v) {
        return $v[1];
    }, $matches[0]);
}

Usage: Simple string as needle.

$html = "dddasdfdddasdffff";
$needle = "asdf";

$all_positions = strpos_all($html, $needle);
var_dump($all_positions);

Output:

array(2) {
  [0]=>
  int(3)
  [1]=>
  int(10)
}

Or with regex as needle.

$html = "dddasdfdddasdffff";
$needle = "[d]{3}";

$all_positions = strpos_all($html, $needle);
var_dump($all_positions);

Output:

array(2) {
  [0]=>
  int(0)
  [1]=>
  int(7)
}
Jalo
  • 31
  • 1
  • 4
  • Using regular expressions to look for a substring is not a good approach. Of course you can do it but regex is for more complex scenarios. Using `strpos` is much simpler in this case and does the job. – omar jayed Apr 11 '20 at 04:44
  • 1
    A warning about getting the offset in a string that may have multibyte characters: [preg_match and UTF-8 in PHP](https://stackoverflow.com/q/1725227/2943403) – mickmackusa Mar 15 '22 at 03:36
0
<?php
$mainString = "dddjmnpfdddjmnpffff";
$needle = "jmnp";
$lastPos = 0;
$positions = array();

while (($lastPos = strpos($html, $needle, $lastPos))!== false) {
    $positions[] = $lastPos;
    $lastPos = $lastPos + strlen($needle);
}

// Displays 3 and 10
foreach ($positions as $value) {
    echo $value ."<br />";
}
?>
Dr. Nio
  • 9
  • 4