3

This code opens all excel files in a folder then it gets all emails in the file opened and puts them in an array. In the end I need ONE BIG array from all the content from all the array of arrays. I need it to be one big array of all emails from all files.

The code below is not working. I am sure this is a simple one. Thanks

<?

$Folder = "sjc/";
$files = scandir($Folder);


function cleanFolder($file)
{
$string = file_get_contents("sjc/$file");
$pattern = '/[a-z0-9_\-\+]+@[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';
preg_match_all($pattern, $string, $matches);

$Emails[] = $matches[0];
return $Emails;
}



function beginClean($files)
{
    for($i=0; count($files)>$i;$i++)
        {
        $Emails = cleanFolder("$files[$i]");
        $TheEmails .= explode(",",$Emails);

        }

/// Supposed to be a big string of emails separated by comma
echo $TheEmails; // But it just echos .... ArrayArrayArrayArrayArray etc...

// WHAT I REALLY WANT IS.. one Array holding all emails, not an Array of Arrays. 
}

beginClean($files);

?>

UPDATE: GOT TOT WORK.. HOWEVER I am having a memory issue now as the emails total over 229911.

Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 71 bytes) in /home/public_html/StatuesPlus/CleanListFolder.php on line 33

Here is the code that worked:

<?

$Folder = "sjc/";
$files = scandir($Folder);


function cleanFolder($file)
{
//echo "FILE NAME " . $file . "<br>";
$string = file_get_contents("sjc/$file");
$pattern = '/[a-z0-9_\-\+]+@[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';
preg_match_all($pattern, $string, $matches);

$TheEmails .= implode(',', $matches[0]);
return $TheEmails;

}



function beginClean($files)
{
    for($i=0; count($files)>$i;$i++)
        {
        $Emails .= cleanFolder("$files[$i]");
        }



$TheEmails = explode(",", $Emails);
//$UniqueEmails= array_unique($TheEmails);
echo count($TheEmails);
//file_put_contents("Emails.txt", $TheEmails);
}

beginClean($files);

?>
Papa De Beau
  • 3,744
  • 18
  • 79
  • 137
  • 1
    instead of raw excel files at least convert to csv, then getting the emails is a a snap. your regular expression will not match some valid email addresses –  May 21 '13 at 01:58
  • Thanks Dagon, I was going to do that but it has lots of excel files. I only know how to do it manually. Also this info has much more than emails. I am just taking the emails. Is there a code to convert excel to csv via php? – Papa De Beau May 21 '13 at 02:01
  • 1
    not via php but from the command line: http://stackoverflow.com/questions/1858195/convert-xls-to-csv-on-command-line –  May 21 '13 at 02:08

2 Answers2

2

.= is used for concatenating strings, not arrays. But you can just keep them as strings for a while:

$TheEmails .= ",$Emails";

And then:

$TheEmails = explode(',', substr($TheEmails, 1));
Ry-
  • 218,210
  • 55
  • 464
  • 476
  • Fast. Thanks. Where exactly would I put this code? In the first loop or second? Also what does the substr do and the 1? – Papa De Beau May 21 '13 at 01:54
  • @PapaDeBeau: The `substr` just takes off the leading comma, since the first item gets the comma too. Anyways, `$TheEmails .= ",$Emails"` replaces the other `$TheEmails .= …` line, and `$TheEmails = explode…` goes after the loop. – Ry- May 21 '13 at 02:02
  • Thanks. I think there is yet one issue. $Emails = cleanFolder("$files[$i]"); is turning $Emails into and arrray to its not actually the email but an array from the other loop. – Papa De Beau May 21 '13 at 02:08
1

Below is the Final code I used to gather multiple emails from multiple excel sheets in any give folder. The files can be CSV, XLS, XLSX, HTML etc.. and this code will abstract the emails from multiple pages in that folder and puts them into ONE HUGE ARRAY. :)

<?
    // See below for ARRAY out put called $FinalEmails 

    // SET YOUR FOLDER HERE

    $Folder = "sjc/";
    $files = scandir($Folder);


    function cleanFolder($file)
    {

    $string = file_get_contents("$Folder/$file");
    $pattern = '/[a-z0-9_\-\+]+@[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';
    preg_match_all($pattern, $string, $matches);

    $TheEmails .= implode(',', $matches[0]);
    $TheEmails = strtolower($TheEmails);

    return $TheEmails;

    }



    function beginClean($files)
    {
        for($i=0; count($files)>$i;$i++)
            {
            $Emails .= cleanFolder("$files[$i]");
            }



    $TheEmails = explode(",", $Emails);
    $UniqueEmails= array_unique($TheEmails);

    $Emails = implode(",", $UniqueEmails);


    function isValidEmail($email)

    {  
     return filter_var(filter_var($email, FILTER_SANITIZE_EMAIL), FILTER_VALIDATE_EMAIL);  
    }  


    for($i=0; count($UniqueEmails)>$i;$i++)
    {
        if(isValidEmail("$UniqueEmails[$i]"))
        {  
        echo $UniqueEmails[$i] . "<br>";
        $FinalEmails .= "$UniqueEmails[$i],";
        } 
    else 
        {  
        //not valid  
        }
    }


    /// An ARRAY OF Emails from multiple Excel Sheeet Cleaned
    // Cleaned of duplicates and checked if a valid email.
    $FinalEmails = explode(",", $FinalEmails);



    }

    beginClean($files);

    ?>
Papa De Beau
  • 3,744
  • 18
  • 79
  • 137
  • Without `substr`, though, the last element of `$FinalEmails` will be empty. Also, you don’t have to convert from an array to a string to an array. Also, is `$Emails` used? – Ry- May 21 '13 at 13:19
  • It does work without substr. Not sure why. Yes, $Emails in this example is not used. Thanks for pointing that out. I will remove it. – Papa De Beau May 26 '13 at 21:42