I'm building a PHP script that will sift through the HTML contents of a cURL request and match patterns for URLs so that I can manipulate add a GET tag to track outbound links.
I have the Regex pattern that's working, but I can't get it to match more than once; it won't even find a duplicate of the item it does match.
This is the sample HTML, which is currently only matching the first Anchor tag:
`<html><head>
<title></title>
</head>
<body class="body class">
<div>
<a title="1hubwhrrstn" href="http://www.example.com?tag=9qgbc"></a>
<a name=""></a>
<a class="3hubwhbbsrstn" href="http://www.example.com?tag=uqgibc"></a>
<a class="4whbihbw4bsetrrstn" href="http://www.example.com?tag=9uq4i"></a>
<a href="http://www.example.com?tag=9uq4i" class="4whbihbstn"></a>
</div></body>
</html>`
The Regex pattern I'm using is: (<a.*href=".*".*><\/a>)+/im
, and it's only matching the first anchor instance.
Also, I can't find a way tell it to match a new line or all on one line - it gives me one match, running multiple Anchor tags all together when they're on the same line, even though I'm using a capturing group to match the pattern to one anchor tag. So in this case, it's finding one match - even for the doubled Anchors on the same line:
`<html><head>
<title></title>
</head>
<body class="body class">
<div>
<a title="1tn" href="http://www.example.com"></a><a class="3htn" href="http://www.example.com"></a>
<a name=""></a>
<a class="4whbihbw4bsetrrstn" href="http://www.example.com?tag=9uq4i"></a>
<a href="http://www.example.com?tag=9uq4i" class="4whbihbstn"></a>
</div></body>
</html>`
I've gone through two hours of tinkering and double checking flags and quantifiers, testing as I go on regex101.com and can't figure where I'm making a mistake.
Any help would be great. Thanks so much!