46

I know how to use preg_match and preg_match_all to find the actual matches of regex patterns in a given string, but the function that I am writing not only needs the text of the matches, but to be able to traverse the string AROUND the matches...

Therefore, I need to know the position of the match in the string, based on a regex pattern.

I can't seem to find a function similar to strpos() that allows regex...any ideas?

johnnietheblack
  • 13,050
  • 28
  • 95
  • 133
  • once you have the match, can't you just use `strpos()` to find its position? – scibuff Mar 05 '12 at 17:38
  • Once you have your matches, you can then use strpos() to find the position within the string. – SenorAmor Mar 05 '12 at 17:38
  • 4
    @scibuff - well...sorta, but the regex may have lots of matches, and lots of different kinds of matches...which would mean i'd be adding a decent amount of passes on the string if i had to use more functions. – johnnietheblack Mar 05 '12 at 17:44
  • 1
    @SenorAmor nope, unless you can assume that no two matches are identical. – matteo Aug 09 '18 at 19:42

3 Answers3

91

You can use the flag PREG_OFFSET_CAPTURE for that:

preg_match('/bar/', 'Foobar', $matches, PREG_OFFSET_CAPTURE);
var_export($matches);

Result is:

array (
  0 => 
  array (
    0 => 'bar',
    1 => 3,     // <-- the string offset of the match
  ),
)

In a previous version, this answer included a capture group in the regular expression (preg_match('/(bar)/', ...)). As evident in the first few comments, this was confusing to some and has since been edited out by @Mikkel. Please ignore these comments.

Linus Kleen
  • 33,871
  • 11
  • 91
  • 99
  • 1
    Why it returned the result two times? – Moradnejad Jun 11 '17 at 06:25
  • 4
    @ananda This first match is *all* of the regular expression. The second is the first capture group. Coincidentally, both are the same in the above code. If, for example, the regex was `/Foo(bar)/`, then the first result would be `'Foobar'` with an offset of zero and the second result would be `'bar'` (with offset three) as above. – Linus Kleen Jun 11 '17 at 14:44
  • The manual says: `for every occurring match the appendant string offset (in bytes) will also be returned`. What about UTF-8 or other multi-byte encoding, how does it work? – Rodrigo Oct 31 '18 at 00:51
  • Re: the conversation between Moradnejad and Linus Kleen: I've edited the answer to remove the subpattern for clarity. – Mikkel Sep 13 '19 at 13:30
  • @Mikkel Thank you for your suggestion. I had to roll back to the previous version, though. While you're right - the example could well have lived without the superfluous capture group - the comments cannot be deleted or revised and would irritate future readers. – Linus Kleen Sep 15 '19 at 19:33
  • @LinusKleen I appreciate your perspective, but the argument of "we should keep the answer confusing because people were confused by it and said so" strikes me as self-defeating. The entire point of editing is to clarify answers in response to feedback. – Mikkel Sep 26 '19 at 15:19
  • @Mikkel You are right; no point in arguing any further. I rolled back to your revision and allowed for a section addressing the comments instead. – Linus Kleen Sep 28 '19 at 13:24
  • 1
    I actually got here by searching "php get regex group position" so your original answer is helpful to some folks. That said, the note you added at the end was enough to send me on the right path. – Obscerno Nov 27 '19 at 20:11
  • @Rodrigo have the same question as you. – Charles Bao Jan 17 '20 at 17:33
5

preg_match has an optional flag, PREG_OFFSET_CAPTURE, that records the string position of the match's occurence in the original 'haystack'. See the 'flags' section: http://php.net/preg_match

Marc B
  • 356,200
  • 43
  • 426
  • 500
0

With use of PREG_OFFSET_CAPTURE on preg_match() you will get number of times on matches on pattern. When there is a match this will have the offset value which starts from 0.

Using this value you can call preg_match again using offset parameter.

PoX
  • 1,229
  • 19
  • 32