I've been trying set up a simple PHP API that will essentially retrieve information from another site in two steps. If a person were to do it, it would involve:
- Searching the site
- Clicking on the first result
- Finding the information
The site is set up in a predictable way. I know what the format of searching the site is so I can create the search URL using PHP and the input to the API.
The link for steps 1/2 is formatted like this:
<h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4>
I only want the somelinkhere
, the hyperlink itself. I know that it is the first hyperlink on the page contained within an <h4>
.
I tried a number of Regex expressions in combo with preg_match
, but they have all been failing. For example, the following is one way of doing it that failed:
$url = "https://www.example.com/?query=somequery";
$input = @file_get_contents($url) or die("Could not access file: $url");
preg_match_all('/<h4><a [^>]*\bhref\s*=\s*"\K[^"]*[^"]*/', $text, $results);
echo "$results";
echo "$results[0]";
echo "$results[0][0]";
I did the last three echoes as I'm not terribly familiar with the format preg_match_all
returns. I tried preg_match
as well with the same result. I only care about the first such link, so I don't need preg_match_all
, but if I could just get the first result, that would work also.
What is the best way to parse the page and get the first hyperlink in the h4
into a variable?
Some Text Here
; preg_match($re, $str, $matches, PREG_OFFSET_CAPTURE, 0); // Print the entire match result echo $matches[0][0]; ?> ` – Code Maniac Sep 17 '19 at 01:27