5

I have a CSV file I'm importing but am running into an issue. The data is in the format:

TEST 690, "This is a test 1, 2 and 3" ,$14.95 ,4

I need to be able to explode by the , that are not within the quotes...

Armstrongest
  • 15,181
  • 13
  • 67
  • 106
Ben
  • 60,438
  • 111
  • 314
  • 488
  • One thing I would try is, if you can, change the input file so that EVERYTHING is in quotes, and then you can explode by `","` after pulling off the first and last quotes. That way, it won't explode the commas not immediately next to quotes. Of course, that's only if you don't want to use `fgetcsv` like Artefacto suggests and you want to challenge yourself with it. – Jeff Rupert May 14 '10 at 15:21
  • I can't make everything wrapped in quotes, it's exported that way through another system. – Ben May 14 '10 at 15:25
  • Are quotes only possible on the second field? – Armstrongest May 14 '10 at 15:30
  • According to commonly accepted CSV spec, quotes are optional, and only required to disambiguate a field that has quotes, commas or multiple lines in it. http://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules – beporter May 14 '10 at 15:42

2 Answers2

10

See the fgetcsv function.

If you already have a string, you can create a stream that wraps it and then use fgetcsv. See http://code.google.com/p/phpstringstream/source/browse/trunk/stringstream.php

Artefacto
  • 96,375
  • 17
  • 202
  • 225
  • I'd rather user the regex cause there is special functionality here – Ben May 14 '10 at 15:24
  • 6
    Don't do it with regex. It's not as simple as it looks. You may have line breaks in the strings. You may have escaped characters. – Artefacto May 14 '10 at 15:32
  • 2
    Once the CSV is parsed (by fgetscsv) you can regex-process each individual field to your heart's content. – Roadmaster May 14 '10 at 15:36
  • One has to note that `fgetcsv` has the problem of eating special characters if they are the first letter of a string value, so you sometimes just have to work around it. – m90 Sep 13 '12 at 08:05
  • See: http://stackoverflow.com/questions/12390851/php-is-eating-the-first-letter-of-a-string-if-its-an-umlaut/ – m90 Sep 13 '12 at 08:21
5

If you really want to do this by hand, here's a rough reference implementation I wrote to explode a complete line of CSV text into an array. Be warned: This code does NOT handle multiple-line fields! With this implementation, the entire CSV row must exist on a single line with no line breaks!

<?php
//-----------------------------------------------------------------------
function csvexplode($str, $delim = ',', $qual = "\"")
// Explode a single CSV string (line) into an array.
{
    $len = strlen($str);  // Store the complete length of the string for easy reference.
    $inside = false;  // Maintain state when we're inside quoted elements.
    $lastWasDelim = false;  // Maintain state if we just started a new element.
    $word = '';  // Accumulator for current element.

    for($i = 0; $i < $len; ++$i)
    {
        // We're outside a quoted element, and the current char is a field delimiter.
        if(!$inside && $str[$i]==$delim)
        {
            $out[] = $word;
            $word = '';
            $lastWasDelim = true;
        } 

        // We're inside a quoted element, the current char is a qualifier, and the next char is a qualifier.
        elseif($inside && $str[$i]==$qual && ($i<$len && $str[$i+1]==$qual))
        {
            $word .= $qual;  // Add one qual into the element,
            ++$i; // Then skip ahead to the next non-qual char.
        }

        // The current char is a qualifier (so we're either entering or leaving a quoted element.)
        elseif ($str[$i] == $qual)
        {
            $inside = !$inside;
        }

        // We're outside a quoted element, the current char is whitespace and the 'last' char was a delimiter.
        elseif( !$inside && ($str[$i]==" ")  && $lastWasDelim)
        {
            // Just skip the char because it's leading whitespace in front of an element.
        }

        // Outside a quoted element, the current char is whitespace, the "next" char is a delimiter.
        elseif(!$inside && ($str[$i]==" ")  )
        {
            // Look ahead for the next non-whitespace char.
            $lookAhead = $i+1;
            while(($lookAhead < $len) && ($str[$lookAhead] == " ")) 
            {
                ++$lookAhead;
            }

            // If the next char is formatting, we're dealing with trailing whitespace.
            if($str[$lookAhead] == $delim || $str[$lookAhead] == $qual) 
            {
                $i = $lookAhead-1;  // Jump the pointer ahead to right before the delimiter or qualifier.
            }

            // Otherwise we're still in the middle of an element, so add the whitespace to the output.
            else
            {
                $word .= $str[$i];  
            }
        }

        // If all else fails, add the character to the current element.
        else
        {
            $word .= $str[$i];
            $lastWasDelim = false;
        }
    }

    $out[] = $word;
    return $out;
}


// Examples:

$csvInput = 'Name,Address,Phone
Alice,123 First Street,"555-555-5555"
Bob,"345 Second Place,   City  ST",666-666-6666
"Charlie ""Chuck"" Doe",   3rd Circle   ,"  777-777-7777"';

// explode() emulates file() in this context.
foreach(explode("\n", $csvInput) as $line)
{
    var_dump(csvexplode($line));
}
?>

I would still recommend relying on PHP's built-in function though. That's (hopefully) going to be far more reliable long term. Artefacto and Roadmaster are right.: anything you have to do to the data is best done after you import it.

beporter
  • 3,740
  • 3
  • 37
  • 45