I'm trying to find strings that contain a domain. I have the following pattern:
"|s:\\d+:\\\\\"((?:.(?!s:\\d+))+?){$domain}(.+?)\\\\\";|"
This (pattern) seems to work, but I get only the first two matches in PHP.
$filename = "caciki_tr.sql";
$domain = "caciki.com.tr";
$domain = escape($domain, ".");
$content = file_get_contents($filename);
$pattern = "|s:\\d+:\\\\\"((?:.(?!s:\\d+))+?){$domain}(.+?)\\\\\";|";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);
print_r($matches);
function escape($string, $chars) {
$chars = str_split($chars);
foreach ($chars as $char) {
$string = str_replace($char, "\\{$char}", $string);
}
return $string;
}
Array
(
[0] => Array
(
[0] => s:121:\"/home/caciki/domains/caciki.com.tr/public_html/wp-content/themes/rafine/woocommerce/single-product/product-thumbnails.php\";
[1] => /home/caciki/domains/
[2] => /public_html/wp-content/themes/rafine/woocommerce/single-product/product-thumbnails.php
)
[1] => Array
(
[0] => s:81:\"/home/caciki/domains/caciki.com.tr/public_html/wp-content/themes/rafine/style.css\";
[1] => /home/caciki/domains/
[2] => /public_html/wp-content/themes/rafine/style.css
)
)
I get the all matches (11) only when I tinker with the target file. Something must be breaking the pattern/PHP.
I've tested the same pattern in Python and C#, and they give the correct result:
So what's wrong here?
caciki_tr.sql (target file)
Update: The pattern here is used with different substrings (e.g., domain, url, username, etc.). Not all strings in the target file follows the same pattern. For example, a pattern for URLs should be able to match the following:
$url = "http://[DOMAIN_OMITTED]/~caciki";
$pattern = "|s:\d+:\\\\\"([^s]*(?:s(?!:\d)[^s]*)*){$url}(.+?)\\\\\";|";
s:28:\"http://[DOMAIN_OMITTED]/~caciki\";
s:28:\"<a href=\"http://[DOMAIN_OMITTED]/~caciki\">some page</a>\";
In short, there might not be a string between the s:28:\"
and the substring ($url), or after the substring. So it should be optional.