0

I am working on scraping and then parsing an HTML string to get the two URL parameters inside the href. After scraping the element I need, $description, the full string ready for parsing is:

<a target="_blank" href="CoverSheet.aspx?ItemID=18833&amp;MeetingID=773">Description</a><br>

Below I use the explode parameter to split the $description variable string based on the = delimiter. I then further explode based on the double quote delimiter.

Problem I need to solve: I want to only print the numbers for MeetingID parameter before the double quote, "773".

<?php
echo "Description is: " . htmlentities($description); // prints the string referenced above
$htarray = explode('=', $description); // explode the $description string which includes the link. ... then, find out where the MeetingID is located
echo $htarray[4] .  "<br>"; // this will print the string which includes the meeting ID: "773">Description</a><br>"

$meetingID = $htarray[4];
echo "Meeting ID is " . substr($meetingID,0,3); 
?>

The above echo statement using substr works to print the meeting ID, 773.

However, I want to make this bulletproof in the event MeetingID parameter exceeds 999, then we would need 4 characters. So that's why I want to delimit it by the double quotes, so it prints all numbers before the double quotes.

I try below to isolate all of the amount before the double quotes... but it isn't seeming to work correctly yet.

<?php
 $htarray = explode('"', $meetingID); // split the $meetingID string based on the " delimiter
 echo "Meeting ID0 is " . $meetingID[0] ; // this prints just the first number, 7
 echo "Meeting ID1 is " . $meetingID[1] ; // this prints just the second number, 7
 echo "Meeting ID2 is " . $meetingID[2] ; // this prints just the third number, 3

?>

Question, why is the array $meetingID[0] not printing the THREE numbers before the delimiter, ", but rather just printing a single number? If the explode function works properly, shouldn't it be splitting the string referenced above based on the double quotes, into just two elements? The string is

"773">Description</a><br>"

So I can't understand why when echoing after the explode with double quote delimiter, it's only printing one number at a time..

Daniel C
  • 607
  • 2
  • 8
  • 20
  • 2
    "why is the array $meetingID[0] not printing the THREE numbers before the delimiter" -- because `$meetingID` is the string. The exploded arary is `$htarray`. I think you're looking for `$htarray[0]`? – rickdenhaan Sep 27 '20 at 18:55
  • you are right! Thank you, problem solved. – Daniel C Sep 27 '20 at 19:07
  • if you can write that as an answer I can give you the correct response. – Daniel C Sep 27 '20 at 19:07
  • You would normally be better off processing HTML with something like DOMDocument, see https://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php. – Nigel Ren Sep 27 '20 at 19:16
  • @NigelRen thanks I am using the PHP Simple Dom Parser but trying to get more advanced once I have the strings. – Daniel C Sep 27 '20 at 19:19

2 Answers2

1

There is a very easy way to do it:

Your Str:

$str ='<a target="_blank" href="CoverSheet.aspx?ItemID=18833&amp;MeetingID=773">Description</a><br>';

Make substr:

$params = substr( $str, strpos( $str, 'ItemID'), strpos( $str, '">') - strpos( $str, 'ItemID') );

You will get substr like this :

ItemID=18833&MeetingID=773

Now do whatever you want to do!

SOS9GS
  • 362
  • 1
  • 7
1

The reason you're getting the wrong response is because you're using the wrong variable.

$htarray = explode('"', $meetingID);

echo "Meeting ID0 is " . $meetingID[0] ; // this prints just the first number, 7
echo "Meeting ID1 is " . $meetingID[1] ; // this prints just the second number, 7
echo "Meeting ID2 is " . $meetingID[2] ; // this prints just the third number, 3

echo "Meeting ID is " . $htarray[0] ; // this prints 773

There's an easier way to do this though, using regular expressions:

$description = '<a target="_blank" href="CoverSheet.aspx?ItemID=18833&amp;MeetingID=773">Description</a><br>';

$meetingID = "Not found";
if (preg_match('/MeetingID=([0-9]+)/', $description, $matches)) {
    $meetingID = $matches[1];
}

echo "Meeting ID is " . $meetingID;
// this prints 773 or Not found if $description does not contain a (numeric) MeetingID value
rickdenhaan
  • 10,857
  • 28
  • 37