-1

I am working on a CSV parser in PHP.

The parsed $data variable will hold a header row (should be saved as $header) from the submitted file, and the rest of the array (which should be saved into $body array) will hold the actual data.

Is there a concise syntax to split this $data variable into $header and $body?

The code so far is as follows. The CSV file is parsed into a $data array which holds the header and the body data in a single variable.

$csvFile = file($file_name);
$data = [];
foreach ($csvFile as $line) {
    // remove unnecessary whitespace characters
    $line = str_replace("\xEF\xBB\xBF",'',$line);
    $data[] = str_getcsv($line);
}

Edit:

This is the sample CSV data I am working with:

"uid","area","office","tel"
"6936","SEL","Branch 1","080-123-4567"
"6935","SEL","Branch 2","080-123-4567"
"6934","SEL","Branch 3","080-123-4567"
"6933","SEL","Branch 4","080-123-4567"
"6932","SEL","Branch 5","080-123-4567"
"6931","SEL","Branch 6","080-123-4567"
"6930","SEL","Branch 7","080-123-4567"
"6929","SEL","Branch 8","080-123-4567"
"6928","SEL","Branch 9","080-123-4567"
"6927","SEL","Branch 10","080-123-4567"
"6926","SEL","Branch 11","080-123-4567"

The intended results should be as follows:

$data = [
    ["uid", "area", "office", "tel"],
    ["6936", "SEL", "Branch 1", "080-123-4567"],
    ["6935", "SEL", "Branch 2", "080-123-4567"],
    ["6934", "SEL", "Branch 3", "080-123-4567"],
    ["6933", "SEL", "Branch 4", "080-123-4567"],
    ["6932", "SEL", "Branch 5", "080-123-4567"],
    ["6931", "SEL", "Branch 6", "080-123-4567"],
    ["6930", "SEL", "Branch 7", "080-123-4567"],
    ["6929", "SEL", "Branch 8", "080-123-4567"],
    ["6928", "SEL", "Branch 9", "080-123-4567"],
    ["6927", "SEL", "Branch 10", "080-123-4567"],
    ["6926", "SEL", "Branch 11", "080-123-4567"]
]
$header = ["uid", "area", "office", "tel"];
$body = [
    ["6936", "SEL", "Branch 1", "080-123-4567"],
    ["6935", "SEL", "Branch 2", "080-123-4567"],
    ["6934", "SEL", "Branch 3", "080-123-4567"],
    ["6933", "SEL", "Branch 4", "080-123-4567"],
    ["6932", "SEL", "Branch 5", "080-123-4567"],
    ["6931", "SEL", "Branch 6", "080-123-4567"],
    ["6930", "SEL", "Branch 7", "080-123-4567"],
    ["6929", "SEL", "Branch 8", "080-123-4567"],
    ["6928", "SEL", "Branch 9", "080-123-4567"],
    ["6927", "SEL", "Branch 10", "080-123-4567"],
    ["6926", "SEL", "Branch 11", "080-123-4567"]
];
K.H.
  • 113
  • 2
  • 10
  • I think you're looking for [array_shift](https://www.php.net/array_shift) – Dale Sep 28 '22 at 08:15
  • please show us a sample data of `$data` and based on that sample data what expected outcome you want, post that too. – Alive to die - Anant Sep 28 '22 at 08:16
  • Why not use any of the existing libraries to handle this? – Nico Haase Sep 28 '22 at 08:24
  • Trying to do this with `file` and `str_getcsv` is _dangerous_, it will fail if any of the columns in the CSV would ever contain a line break in its value. `fgetcsv` is written to take that into account, but `file` just hacks the file apart at _any_ line breaks it finds. – CBroe Sep 28 '22 at 08:52
  • _"to filter out the white-space characters that gets added into the CSV files created on Windows."_ - it looks like you are not talking about just any arbitrary whitespace, but in fact actually a BOM, specifically (because that's what you are replacing, an UTF-8 BOM.) And unless that makes its way into individual column contents via copy&paste (and the program used to edit the files doesn't correct that), it should only be at the very beginning of the file. And in that case, I would use `fgetcsv`, and only trim it from the very first value you read. – CBroe Sep 28 '22 at 08:57

1 Answers1

2

use fgetcsv() it makes it quite simple.

It reads the csv file 1 row at a time. Then simply read the first line into $header outside the loop and the rest into $data inside the loop

if (($fh = fopen($file_name, "r")) !== FALSE) {
    $header = fgetcsv($fh , 1000, ",");

    while (($row = fgetcsv($fh , 1000, ",")) !== FALSE) {
        
        // remove unnecessary whitespace characters
        foreach ( $row as &$col ) {
            $col = str_replace('\xEF\xBB\xBF','',$col);
        }
        $body[] = $row;
    }
    fclose($fh);
}

The manual page for fgetcsv()

RiggsFolly
  • 93,638
  • 21
  • 103
  • 149
  • I'm working with string data in order to filter out the white-space characters that gets added into the CSV files created on Windows. Since the end users will be mostly non-tech-savvy people, I'm trying to do the file sensitization from the server-side, instead of teaching the end users how to clean up the files before uploading. I am open to changing my current code to use fgetcsv instead if this can be done using this method. – K.H. Sep 28 '22 at 08:30
  • Of course, fgetcsv reads the column from the csv into an array, you just have to do your tidying on the element or elements of the $row array that contain the issues – RiggsFolly Sep 28 '22 at 08:33
  • I changed the code, I didnt test the clean up loop so if its a probelm ping me again – RiggsFolly Sep 28 '22 at 08:35
  • I fixed the `str_replace()` it would not have been doing anything anyway with the a pattern using backslashes when placed in a double quoted string as the backslash would have been treated as an escape character – RiggsFolly Sep 28 '22 at 08:43
  • Thanks. I'm off work for now, but I will try this method when I get back. – K.H. Sep 28 '22 at 09:11