6

I have a CSV file where the first "cell" is just an int, 9, in this case. The next line is 10 for the first "cell" and so on. When I do $array = fgetcsv($file); the first cell of the first line has these weird characters in front of the value: ˇ˛

It's messing with my database import since this cell is supposed to only contain an int. It only happens on the first cell of the first line.

Any ideas on why this is happening and what I can do to avoid it?

CR47
  • 843
  • 4
  • 12
  • 33
  • Was the CSV file saved with a BOM header? – Mark Baker Nov 21 '13 at 15:14
  • @MarkBaker It was a CSV exported from an MSSQL database and imported into to OpenOffice and saved again as a CSV. How can I tell if it was saved with a BOM header? – CR47 Nov 21 '13 at 15:17
  • 1
    Most text editors that support such things have an option which allows you to see what encoding is set.... for example, I use NotePad++, and there's an `encoding` option on the menu that shows me the current encoding and allows me to change it – Mark Baker Nov 21 '13 at 15:22
  • possible duplicate: http://stackoverflow.com/questions/3255993/how-do-i-remove-i-from-the-beginning-of-a-file – bitWorking Nov 21 '13 at 15:28
  • @MarkBaker Saving as UTF-8 without BOM worked. May want to make it an answer so it can be accepted. – CR47 Nov 21 '13 at 15:35

3 Answers3

6

As others suggested, weird characters are Byte Order Mark (BOM). In order to remove it you can use following snippet:

if (mb_detect_encoding($value) === 'UTF-8') {
    // delete possible BOM
    // not all UTF-8 files start with these three bytes
    $value = preg_replace('/\x{EF}\x{BB}\x{BF}/', '', $value);
}
hpaknia
  • 2,769
  • 4
  • 34
  • 63
2

I ran into this problem today. I had these results appear for the first result of the first row:

123465

The solution I had was to add this to my HTML head:

<meta charset="UTF-8">

The result then became:

123456

This is because my CSV file was encoded in UTF-8, so by declaring the character set as UTF-8 I was able to get the intended results.

Muhammad Abdul-Rahim
  • 1,980
  • 19
  • 31
1

Sounds like you have a unicode file and are picking up the Byte Order Mark.

elixenide
  • 44,308
  • 16
  • 74
  • 100