I am trying to get all the unique emails from a HTML page into an array. The file is HUGE and there are no real patterns to get just the emails.
Here is an example html called GetEmails.html --- The actual file will have css and much more code to sift through. In this example, notice the unique patterns of emails. In short not all are separated by spaces but some with commas and semi colons etc..
<html>
<body>
<p>This is some text and here is an email me@myemail.com and in this text we will see lots of emails like hello@hotmail.com; mike@hello.com, Bill@John.com or even dot orgs too like ed@wisdom.org and all types such as bill@hot.tv,mary@Mary.us and even Obama@yikes.gov some might be bold Ed@Ed.com and some will look like this Email:<strong>Ed@myemail.com</strong>
</p>
<p><u>There will be pages and pages and pages of text to sift thru so get the emails into an array.</u></p>
<p>This is some text and here is an email me@myemail.com and in this text we will see lots of emails like hello@hotmail.com; mike@hello.com, Bill@John.com or even dot orgs too like ed@wisdom.org and all types such as bill@hot.tv,mary@Mary.us and even Obama@yikes.gov some might be bold Ed@Ed.com and some will look like this Email:<strong>Ed@myemail.com</strong> and repeat This is some text and here is an email me@myemail.com and in this text we will see lots of emails like hello@hotmail.com; mike@hello.com, Bill@John.com or even dot orgs too like ed@wisdom.org and all types such as bill@hot.tv,mary@Mary.us and even Obama@yikes.gov some might be bold Ed@Ed.com and some will look like this Email:<strong>Ed@myemail.com</strong></p>
<p> </p>
</body>
</html>
I thought to use an explode with spaces but that might not work and might use up too much resources. Just wondering if there is a simple function in php to help me get all the emails into an array. Here is what I tried.
<?
$lines = file('GetEmails.html');
foreach ($lines as $line_num => $line) {
/// Finds if line has email.
if (preg_match('/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/si', $line))
{
// Puts that line into an array
$line = explode(" " , strip_tags($line));
// Finds if one of the itmes has an @ sign
$fl_array = preg_grep("/@/", $line);
// Puts that email in an array
$TheEmails[] = trim($fl_array);
// Puts only the unique emails an an array
$UniqueEmails= array_unique($TheEmails);
?>
This code above works, however; the HUGE file I will use I am afraid its using resources unnecessarily. Also it will not account for emails separated by commas like this ed@ed.com,mike@mike.com
Any ideas on the best way to do this? At the very least it would be VERY VERY helpful to learn how to do this the best way even if I can only get the emails that are separated by spaces etc...
Hope this makes sense. Thanks so much!