I am working on moving some blog-ish articles to a new third-party home, and need to replace some existing URLs with new ones. I cannot use XML, and am being forced to use a wrapper class that requires this search to happen in regex. I'm currently having trouble regex-ing for the URLs that exist in the html. For example if the html is:
<h1><a href="http://www.website.com/article/slug-that-has-undetermined-amount-of-hyphens/12345">Whatever</a></h1>
I need my regex to return:
http://www.website.com/article/slug-that-has-undetermined-amount-of-hyphens/12345
The beginning part of the URL never changes (the "http://www.website.com/article/"
part). However, I have no clue what the slug phrases are going to be, but do know they will contain an unknown about of hyphens between the words. The ID number at the end of the URL could be any integer.
There are multiple links of these types in each article, and there are also other types of URLs in the article that I want to be sure are ignored, so I can't just look for phrases starting with http
inside of quotes.
FWIW: I'm working in php and am currently trying to use preg_match_all
to return an array of the URLs needed
Here's my latest attempt:
$array_of_urls = [];
preg_match_all('/http:\/\/www\.website\.com\/article\/[^"]*/', $variable_with_html, $array_of_urls);
var_dump($array_of_urls);
And then I get nada dumped out. Any help appreciated!!!