5

I'm using fgetcsv to read csv files in my application. Problem is that I don't know if my users will use ,, |, or ; as delimiters. If I tell fgetcsv nothing it will only accept , but if I tell it to use a specific delimiter it will only accept that.

What I'm doing now is:

while (($row = fgetcsv($fileHandle, 4096)) !== false) { 

What I want is something like this:

while (($row = fgetcsv($fileHandle, 4096, array(',', ';', '|'))) !== false) {

Is that possible in some easy way? It seems really weird that you would have to limit your delimiter to one specific character, since CSV is not standardized.

Christoffer
  • 7,470
  • 9
  • 39
  • 55
  • 1
    I think you will run into a similar problem as this one: http://stackoverflow.com/q/3395267 <- Because if you will find a comma and a semicolon in your csv file, which one would you choose as delimiter ? – Rizier123 May 15 '15 at 09:06
  • Your question is not specific to Laravel as your tags suggested – Emeka Mbah May 15 '15 at 09:10
  • @Rizier123 The csv file will only contain email addresses and two letters. So I know that any commas, pipes or semicolons are delimiters – Christoffer May 15 '15 at 09:16
  • 2
    You might be surprised at the characters that are actually allowed in an email address...The local-part of the email address may use any of these ASCII characters RFC 5322 Section 3.2.3: Letters (a–z, A–Z) Digits 0 to 9 Chars !#$%&'*+-/=?^_`{|}~ Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively. Special allowed with restrictions: Space and "(),:;<>@[\] contained between quotation marks, and the space, backslash \ and quotation mark " must also be preceded by a backslash \. – Gavin Jackson May 15 '15 at 09:20
  • It's not standardized, but within a single CSV file there can be only one. How about scanning each file and counting the number of times a delimeter is present? – MeanGreen May 15 '15 at 09:21

3 Answers3

6

You cannot reliably determine the delimiter of a csv file if you don't know it. Take this simple example:

foo;bar,hello;world

What is the delimiter? , or ;? If you would pass an array array(',',';') what data are you expecting fgetcsv() will return?

If you don't know the delimiter you need to ask the user for it.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • The added comment states there are only email addresses in the file, so unless the list is very short this is not true. – MeanGreen May 15 '15 at 09:23
  • The csv's will contain email addresses and the letter A or B, so I know that any ; , | etc are delimiters. So in your example both , and ; would be delimiters. – Christoffer May 15 '15 at 09:23
  • 3
    Then it is not CSV anymore. CSV supports a single delimiter only. Why not replacing the `;` by `,` before parsing? Unless you have a huge input file it might be ok. – hek2mgl May 15 '15 at 09:24
  • @Christoffer Then try all variations in turn and validate whether the result matches your requirements (result array length == 2, first item validates as email, second as letter). – deceze May 15 '15 at 09:24
  • @hek2mgl The thing is that it will most probably only contain a single delimiter - but I will not know which one. So it will be a CSV, but I will not know the delimiter. – Christoffer May 15 '15 at 09:26
  • 1
    @hek2mgl I'm thinking of replacing the characters like you say. Might be the best solution. The second best might be to ask the user for the delimiter. – Christoffer May 15 '15 at 09:28
  • @Christoffer ... meaning you indeed need to determine the delimiter. :) Probably it will be possible to determine the delimiter in your special edge case where the data contains only email addresses, however this is not part of the CSV standard and makes no sense to get implemented in `fgetcsv()` - for obvious reasons. However, what about my suggestion to replace `;` by `,` before parsing and then using `,` as the delimiter? – hek2mgl May 15 '15 at 09:30
  • 1
    @Christoffer Yeah, your last comment is the way to go! +1 :) – hek2mgl May 15 '15 at 09:31
  • 1
    Btw, there is no *CSV standard* ... (I've used that term, but it is wrong)... More or less CSV is a silent agreement and there are a lot of variations of it. – hek2mgl May 15 '15 at 09:33
  • 1
    @hek2mgl Yes, the lack of standard is a problem. Together with the fact that Microsoft won't let users just pick the f¤%!/"%ing delimiter when saving a excel file to csv (you need to change you system locale settings). So you can't tell noob users which delimiter to use, since it's hard enough for them to just pick another format when saving. – Christoffer May 15 '15 at 09:40
  • @Christoffer Yeah, it's all about to earn billons by maintaining the CSV delimiter!! :) Sounds simple, isn't it? ... – hek2mgl May 15 '15 at 09:44
  • @hek2mgl funny: `csv` means **c** `omma` **s** `eparated` **v** `alues`, but the delimiter doesn't have to be a comma :) (Is it also raining in Germany ?) – Rizier123 May 15 '15 at 09:49
  • Yeah, even the name *csv* is a lie itself! :) – hek2mgl May 15 '15 at 09:54
  • 1
    @Rizier123 Oh missed that. No rain in Germany plus I have a day off! :) Grüezi! :) – hek2mgl May 15 '15 at 10:20
1

Use the league/csv package, this has a detection feature and more features that make it worth using it over fgetcsv.

Ezra
  • 1,388
  • 1
  • 13
  • 14
  • Basically this is a so called *link only answer*. But I also need to emphasize that *of course* also that library does *not* accept an array of delimiters. – hek2mgl May 15 '15 at 11:32
0

Just to provide a simple answer for anyone else struggling with this. The solution I settled on was to replace all the delimiters to a specific character. Like this:

private function replaceDelimiters($file)
{
    // Delimiters to be replaced: pipe, comma, semicolon, caret, tabs
    $delimiters = array('|', ';', '^', "\t");
    $delimiter = ',';

    $str = file_get_contents($file);
    $str = str_replace($delimiters, $delimiter, $str);
    file_put_contents($file, $str);
}

Note that this will replace all given characters to a ','. So this will only be a good option if you know that those characters will only be used as delimiters, not content. (as in my case)

Also did a blog post about it.

Christoffer
  • 7,470
  • 9
  • 39
  • 55