1

I'm trying to read through a csv file and extract the data thanks to various regex. I do not have access to the imported csv file content.

However, it is possible that one or more lines are empty. For that, it's possible to use trim function(). The problem is to know how to adapt my various arrays to recover the empty lines

enter image description here

On this example, two lines are empty for the phonenumber, so how can I detect that and how to insert these empty lines in my phones array ?

For example, if I do :

foreach($fullNames as $fullName)
{
    echo $fullName."<br>";
}

foreach($phones as $phone)
{
    echo $phone."<br>";
}

The result will be :

{Marc Vador, Marc Vador, Marc Vador, Marc Vador, Marc Vador}

{0692 10 10 10, 0692 10 10 10,0692 10 10 10}

what I want to achieve is this :

{Marc Vador, Marc Vador, Marc Vador, Marc Vador, Marc Vador}

{0692 10 10 10, , 0692 10 10 10, , 0692 10 10 10}

$emptyValue = "";

if (($handle = fopen($loadedSheetName.'.csv', "r")) !== FALSE) 
{ 
    fgetcsv($handle);

    while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) 
    {   
        $col = count($data);

        for($c = 0; $c < $col; $c++)
        {
            $phones = array();
            $mails = array();
            $zipcodes = array();
            $fullNames = array();

            if ('' === trim($data[$c]))
            {
                $emptyValue = "";
            }

            if(preg_match('/^(0)(692|693|262)(\d{6})$/', $data[$c], $matches))
            {
                $phones[] = "+262".$matches[2].$matches[3];
            }

            if(preg_match('/^(0)(692|693|262)( )(\d{2})( )(\d{2})( )(\d{2})$/', $data[$c], $matches))
            {
                $phones[] = "+262".$matches[2].$matches[4].$matches[6].$matches[8];
            }

            if(preg_match('/^(0)(692|693|262)( )(\d{2})( )(\d{2})( )(\d{2})(\/)(0)(692|693|262)( )(\d{2})( )(\d{2})( )(\d{2})$/', $data[$c], $matches))
            {
                $phones[] = "+262".$matches[2].$matches[4].$matches[6].$matches[8].$matches[9]."+262".$matches[11].$matches[13].$matches[15].$matches[17];
            }

            if(preg_match('/^([^\W][a-zA-Z0-9_]+)(\.[a-zA-Z0-9_]+)*(\@)([a-zA-Z0-9_]+)*(\.[a-zA-Z]{2,4})$/', $data[$c], $matches))
            {
                $mails[] = $matches[0];
            } 

            if(preg_match('/^(Sainte|Saint|saint|sainte)(-)([a-zA-z]+)$/', $data[$c], $matches))
            {
                $zipcodes[] = $matches[0];
            }

            if(preg_match('/^(([a-zA-Z\W]+)( )([a-zA-Z\W]+))$/', $data[$c], $matches))
            {
                $fullNames[] = $matches[0];
            }

            if(preg_match('/^(([a-zA-Z\W]+)( )([a-zA-Z\W]+)( )([a-zA-Z\W]+))$/', $data[$c], $matches))
            {  
                $fullNames[] = $matches[0];
            }
        }
    }

    fclose($handle);
}
Maestro
  • 865
  • 1
  • 7
  • 24
  • is `$emptyValue = "";`where you want to do this? – Harvey Fletcher Apr 05 '19 at 10:47
  • Possible duplicate of [fgetcsv skip blank lines in file](https://stackoverflow.com/questions/18324369/fgetcsv-skip-blank-lines-in-file) – A J Apr 05 '19 at 10:48
  • The above linked question will give you the solution you need. – A J Apr 05 '19 at 10:48
  • Hum I looked the linked question, but I don't want to ignore blank lines – Maestro Apr 05 '19 at 10:53
  • You don't want to ignore, but at least you have your condition there. – A J Apr 05 '19 at 10:55
  • Btw `$phones = array()` is overwritten on every iteration. – u_mulder Apr 05 '19 at 10:57
  • @HarveyFletcher, @A J, I edited my question to show you an example of result – Maestro Apr 05 '19 at 11:42
  • I don't understand why a phone number like `0692123456` becomes `+262692123456`? – Toto Apr 05 '19 at 12:27
  • @Toto, this is one of the constraints my company has set – Maestro Apr 05 '19 at 12:32
  • Then, there is a problem between your code and your examples. From the CSV, you have `0692 10 10 10` after running the script you'll get `+262692101010` in the array `$phones` but you want to display `0692 10 10 10`. How is it possible? How a zip code can begin with `Saint-`? – Toto Apr 05 '19 at 12:44
  • Hum @Toto, I'm displaying the formatted phoneNumber at the end ! And mistake for zipcode part, I meant city ! – Maestro Apr 05 '19 at 12:59

1 Answers1

1

What you have shown us here contains a lot of bad practices. It is hard to advise how you should be addressing the problem when we don't know what the problem is; the end result is not to populate some php arrays - these are just an interim storage mechanism.

Your question is somewhat confusing too - a "line" in relation to a CSV file describes a record, and a record is composed of fields (or sometimes attribute values depending on the nature of the CSV file). From your narrative, what you describe as a "line" is a field or an attribute value.

Each record in the CSV file retains the association between its component fields by the line on which it appears. But fields themselves can contain embedded line breaks if they are quoted or escaped.

By not populating the blank values into your interim representation you are breaking this association.

what I want to achieve is this

So you want to retain the blank values - not skip them. So add a blank value to the array.

Currently your code is very badly structured and buggy. You are simply adding elements to the end of different arrays - not only do you have a problem with blank fields, but if you make a mistake when adding conditions you will lose synchronization of the arrays for non-blank data.

The result will be :

No it won't. The code you have shown us resets the output arrays each time you read a line of data from the file.

If you move

        $phones = array();
        $mails = array();
        $zipcodes = array();
        $fullNames = array();

Outside the while loop, you will get something close to what you describe.

The normal way to fix the problem is to use else if to make each of your matching conditions exclusive:

        if ('' === trim($data[$c]))
        {
            $emptyValue = "";
        }
        else if(preg_match('/^(0)(692|693|262)(\d{6})$/', $data[$c], $matches))
        {
            $phones[] = "+262".$matches[2].$matches[3];
        }
        else if if(preg_match('/^(0)(692|693|262)( )(\d{2})( )(\d{2})( )(\d{2})$/', $data[$c], $matches))
        {
            $phones[] = "+262".$matches[2].$matches[4].$matches[6].$matches[8];
        ...

But your code is currently written to accomodate the fields being presented in any order within the record. While that may really be the case it is a very unusual scenario, and one which is predicated on all the fields being present (not the case) and not duplicated. If you have such lack of structure in your in your input file, you are wasting your time writing code to automate the parsing the data - even if you fix this problem, you will encounter further pain. Garbage in, Garbage out.

However as a purely academic exercise, if we accept the implied predicates are enforced, it is still trivial to solve. Just track the original record association in your code:

if (($handle = fopen($loadedSheetName.'.csv', "r")) !== FALSE) { 
  $phones = array();
  $mails = array();
  $zipcodes = array();
  $fullNames = array();
  $record=0;
  fgetcsv($handle);
  while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {   
    $record++;
    $col = count($data);
    for($c = 0; $c < $col; $c++) {
        if(preg_match('/^(0)(692|693|262)(\d{6})$/', $data[$c], $matches))
        {
           $phones[$record] = "+262".$matches[2].$matches[3];
        }
   ...
symcbean
  • 47,736
  • 6
  • 59
  • 94