Yesterday I put together a parser that takes single line inputs from PHP's file() function and parses out each line into fields (code shown below). I'm using file() instead of fopen() so as not to lock the files in question.
I'm reviewing other solutions and came across greg.kindel's comment on this post saying that any solution using splits or pattern matching is doomed to fail: Javascript code to parse CSV data
I realize that kindel is answering a question about parsing an entire CSV file (line breaks included) so this is a slightly different application, but I would still like to validate my method. The only regex used is to clean individual line data of non-printable characters, but not to parse out individual fields. Am I overlooking something by using splits this way?
Code:
function read_csv($fname = '', $use_headers = true)
{
if(strlen($fname) >= 5 && substr($fname, strlen($fname)-4, 4) == '.csv')
{
$data_array = array();
$headers = array();
# Parse file into individual rows
# Iterate through rows to parse fields.
$rows = file($fname);
for($i = 0; $i < count($rows); $i++)
{
# Remove non-printable characters
# Split string by commas
$rows[$i] = preg_replace('/[[:^print:]]/', '', $rows[$i]);
$split = explode(',', $rows[$i]);
$text = array();
$count = 0;
$fields = array();
# Iterate through split string
foreach($split as $key => $value)
{
# Count cumulative number of quotation marks
# Build field string
# When cumulative number of quotation marks is even, save string and start new field
$count += strlen($value) - strlen(str_replace('"', '', $value));
$text[] = $value;
if($count % 2 == 0)
{
# Reinsert commas in string
# Remove escape quotes from fields encapsulated by quotes
# Convert double-quotation marks to single
$result = implode(',', $text);
if(substr($result, 0, 1) == '"')
{$result = str_replace('""', '"', substr($result, 1, strlen($result)-2));}
$fields[] = $result;
$count = 0;
$text = array();
}
}
# Write $fields to associative array, headers optional
if($i == 0 && $use_headers)
{
foreach($fields as $key => $header)
{$headers[$key] = $header;}
} else {
$tmp = array();
foreach($fields as $key => $value)
{
if($use_headers)
{$tmp[$headers[$key]] = $value;}
else
{$tmp[] = $value;}
}
$data_array[] = $tmp;
}
}
return $data_array;
} else {
# If provided filename is not a csv file, return an error
# Uses the same associative array format as $data_array
return array(0 => array('Error' => 'Invalid filename', 'Filename' => $fname));
}
}