3

I have a client that wants to export a .csv to the server where it will be parsed by PHP in order to generate a table with its data. I'm using iconv to convert to the appropriate encoding (UTF-8). Unfortunately I'm a on Windows, so I don't know what the source encoding is.

What encoding would MAC Excel use to generate a .csv? I've tried so many different combinations, but none work on the french accents, which are - as far as I know - not arranged the same way in the MAC's charset as in UTF-8

For example:

The correct display should be: 'Délégation'

Most types of encoding (including using utf8_encode()) gives: 'DÈlÈgation'

macintosh to UTF-8 gives: 'D»l»gation'

If I open the .csv file - that was saved from MAC - on my PC, I see the french 'é' accents as 'È', so is there a possibility that saving the file onto my computer (or server) forces the file directly to UTF-8 so now the 'È' are the direct values of the characters, instead of an UTF-8 encoding misinterpretation?

Hex Dump

Using bin2hex(), the hex dump for the string: 'DÈlÈgation 1' is: 44c86cc8676174696f6e2031

-- in fact, I'm assuming that it's DÈlÈgation and not Délégation because if I open the .csv file in notepad (on my PC), it shows it up as È and not é.

Prusprus
  • 7,987
  • 9
  • 42
  • 57
  • It's not in any encoding, there is no way to know what kind of screw up of different conversions led to the file eventually having the byte `0xc8` in place of `é`. – Esailija Jan 15 '13 at 04:35
  • I suppose then I'm best just doing a str_replace() on every incorect character as they are caught in the text? – Prusprus Jan 15 '13 at 13:48
  • You said it was supposed to be `Délégation`. If it's supposed to be `DÈlÈgation` then it's obviously in Windows-1252. – Esailija Jan 15 '13 at 14:13
  • Yes it's supposed to be Délégation. The .csv file, when opened in notepad or any text editor on my PC shows up as DÈlÈgation. – Prusprus Jan 15 '13 at 16:44
  • All your notepad is doing is decoding it as windows-1252, it doesn't magically show the correct text any more than your php code would when decoding as windows-1252. – Esailija Jan 15 '13 at 16:59

2 Answers2

3

A common encoding for Mac programs to use is MacRoman.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
0

Would it be possible for your client to install the trial version of Apple Numbers from the Apple website, open the .csv file using Numbers, and then go to "file", "export", "CSV", and pick either "UTF-8" or "windows Latin 1" and resend you the UTF-8 and the Windows Latin 1 files? The "Numbers" application on a Mac solves problematic issues encountered on Excel sometimes...

FabricePMW
  • 627
  • 1
  • 7
  • 9