2

Yesterday I put together a parser that takes single line inputs from PHP's file() function and parses out each line into fields (code shown below). I'm using file() instead of fopen() so as not to lock the files in question.

I'm reviewing other solutions and came across greg.kindel's comment on this post saying that any solution using splits or pattern matching is doomed to fail: Javascript code to parse CSV data

I realize that kindel is answering a question about parsing an entire CSV file (line breaks included) so this is a slightly different application, but I would still like to validate my method. The only regex used is to clean individual line data of non-printable characters, but not to parse out individual fields. Am I overlooking something by using splits this way?

Code:

function read_csv($fname = '', $use_headers = true)
{
    if(strlen($fname) >= 5 && substr($fname, strlen($fname)-4, 4) == '.csv')
    {
        $data_array = array();
        $headers = array();

        # Parse file into individual rows
        # Iterate through rows to parse fields.
        $rows = file($fname);
        for($i = 0; $i < count($rows); $i++)
        {
            # Remove non-printable characters
            # Split string by commas
            $rows[$i] = preg_replace('/[[:^print:]]/', '', $rows[$i]);

            $split = explode(',', $rows[$i]);
            $text = array();
            $count = 0;
            $fields = array();

            # Iterate through split string
            foreach($split as $key => $value)
            {
                # Count cumulative number of quotation marks
                # Build field string
                # When cumulative number of quotation marks is even, save string and start new field
                $count += strlen($value) - strlen(str_replace('"', '', $value));
                $text[] = $value;
                if($count % 2 == 0)
                {
                    # Reinsert commas in string
                    # Remove escape quotes from fields encapsulated by quotes
                    # Convert double-quotation marks to single
                    $result = implode(',', $text);
                    if(substr($result, 0, 1) == '"')
                        {$result = str_replace('""', '"', substr($result, 1, strlen($result)-2));}
                    $fields[] = $result;
                    $count = 0;
                    $text = array();
                }
            }

            # Write $fields to associative array, headers optional
            if($i == 0 && $use_headers)
            {
                foreach($fields as $key => $header)
                    {$headers[$key] = $header;}
            } else {
                $tmp = array();
                foreach($fields as $key => $value)
                {
                    if($use_headers)
                        {$tmp[$headers[$key]] = $value;} 
                    else
                        {$tmp[] = $value;}
                }
                $data_array[] = $tmp;
            }
        }
        return $data_array;
    } else {
        # If provided filename is not a csv file, return an error
        # Uses the same associative array format as $data_array
        return array(0 => array('Error' => 'Invalid filename', 'Filename' => $fname));
    }
}
  • 4
    Why? 50 lines of code to replace a loop and https://www.php.net/manual/en/function.fgetcsv.php ??? – AbraCadaver Jan 31 '20 at 18:58
  • @AbraCadaver Maybe they're stuck with PHP 3.x or something? hah – MonkeyZeus Jan 31 '20 at 19:02
  • @MonkeyZeus after php 5.4 only you can able to use the arrays like this `$data_array[] = $tmp;` – Boopathi D Jan 31 '20 at 19:06
  • 2
    @DroidDev It was rhetorical, but thanks... – MonkeyZeus Jan 31 '20 at 19:07
  • @MonkeyZeus just now checked he meaning for rhetorical ; ) – Boopathi D Jan 31 '20 at 19:09
  • Honestly, because I wanted to see if I could write the parsing function myself. But also because I started on this before I realized I could avoid file locking with fopen() by adding the read-only parameter. By the time I discovered that, I was already working on this function and wanted to finish it. – Zigabutanoid Jan 31 '20 at 19:14

0 Answers0