644

I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also seems to do this with the left and right quotes and long dashes that appear to be coming from the original user creating the Excel file in Mac.

Since CSV is just a text file I'm sure it can handle a UTF8 encoding, so I'm guessing it is an Excel limitation, but I'm looking for a way to get from Excel to CSV and keep the non-ASCII characters intact.

Jeff Treuting
  • 13,910
  • 8
  • 36
  • 47
  • http://stackoverflow.com/questions/155097/microsoft-excel-mangles-diacritics-in-csv-files /// – Dr. belisarius Nov 19 '10 at 01:26
  • 15
    I saw that question before I posted but it is about already having a UTF8 CSV and opening it in Excel not the other way around. – Jeff Treuting Nov 19 '10 at 06:36
  • 8
    The answers below seem to work, but they're just workarounds. Does anyone know how to make Excel do this? – NielW Jan 27 '14 at 21:43
  • 2
    I wanted to drop my 2 cents: After a lot of trial and error, attempts to use VBA functions, etc... (I'm using Excel 97 for various reasons)... simply Saving-As to "CSV (MSDOS)" format resolved this issue for me. In my case, Citrix webinar (GotoWebinar) CSV data ends up with characters to break some Apex code on our end - saving to "CSV (MSDOS)" resolves me (previously) loading the CSV export into Notepad++ and pasting it into a UTF-8 blank file and resaving. :-P :-) – AMM Feb 04 '15 at 14:12
  • 3
    This is a very annoying Excel limitation. I opened a uservoice feature request about this: http://excel.uservoice.com/forums/304921-excel-for-windows-desktop-application/suggestions/10006149-support-saving-csv-in-utf-8-encoding, feel free to vote and make it happen – Doron Yaacoby Oct 01 '15 at 07:44
  • 2
    In my language, excel *automatically* transforms every "..." or "oe" or "ae" in unicode equivalent (…, œ, æ), but when it's time to export, he exclaim _"oh my god, what are thoses characters??"_ Seriously, Microsoft... are you KIDDING? – Martin Oct 23 '15 at 12:53
  • @JeffTreuting, this https://www.itg.ias.edu/content/how-import-csv-file-uses-utf-8-character-encoding-0 worked for me. – shweta Apr 21 '16 at 07:04
  • 7
    according to the Excel uservoice, Microsoft started work on UTF-8 for .csv last month https://excel.uservoice.com/forums/304921-excel-for-windows-desktop-application/suggestions/10006149-support-saving-csv-in-utf-8-encoding – moloko Jul 19 '16 at 13:32
  • Excel uservoice now reports it will be available with Excel 16.0.7466.2023 and later. – John Paul Mar 13 '17 at 19:52
  • 1
    Still can't export tab delimited UTF-8. The ability to specify delimiter and encoding was a feature in open office years ago. – Jim W Aug 29 '17 at 15:53
  • techlandia.com/guardar-csv-utf8-como_163552 "Save as File Type" -> "CSV (Comma delimited)" -> "Tools" -> "Web Options" "Encoding" > "Save this document as" -> "Unicode (UTF-8)" – ztvmark Nov 10 '20 at 22:01

36 Answers36

421

A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.

NOTE: Google Sheets does have limitations when importing. See here.

NOTE: Be careful of sensitive data with Google Sheets.

EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.

nevets1219
  • 7,692
  • 4
  • 32
  • 47
  • 1
    Thanks! It's a simple solution, far better that struggling with Excel. – charlax Jan 10 '12 at 22:17
  • Added another possibly solution (untested) without having to use Google Spreadsheet. – nevets1219 Aug 24 '12 at 17:52
  • 69
    Perhaps Joel Slotsky (former PM of Excel) could send them his *decade-old* post on [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets](http://www.joelonsoftware.com/articles/Unicode.html)? – Indolering May 23 '13 at 05:56
  • 9
    With the normal Windows Notepad (using save as, and then choosing utf-8 in the encoding option) worked for me. For me that's the best approach for me since that must be done by users that have no admin rights in their machines, so installing additional software is not required. – Fer Oct 02 '13 at 09:06
  • 9
    OpenOffice JustWorks(tm) with no hassle - I think it should be included – Robert Cutajar Oct 02 '13 at 10:31
  • I have no experience with OpenOffice but Yessus has submitted something in regards to OpenOffice - if it's not the same thing you should add it as answer – nevets1219 Oct 02 '13 at 23:14
  • @nevets1219 I did import the CSV into Google Spreadsheet but it didn't affect my character issue. Forgetting excel/google and using open office calc did solve my UTF8 issue (inserting my CSV into Excel 2013 as a "text" also failed by the way) – JinSnow Jul 11 '14 at 09:02
  • 10
    Uhhh. I just had a shiver down my back. What if your excel file has 200,000 lines? Or contains sensitive data you do not want to include in an excel spreadsheat? Use Openoffice/Libreoffice if you need to. – Seb Oct 02 '14 at 08:48
  • @Seb use the solution that fits the problem. I'm pretty sure Google spreadsheet can handle large files and they are private if you want them to be. Still a local application will do better in both cases if it's available. – nevets1219 Oct 02 '14 at 15:21
  • I just tried it - and fyi you need to replace the whole file, not just copy the contents as Excel seems to break the text format of the file not the contents. Not sure how they broke it but Google Spreadsheets worked perfectly and as expected. – UKDataGeek Oct 27 '14 at 10:37
  • Here is direct link to Jaimon's UnicodeCSV Addin: https://jaimonmathew.wordpress.com/2011/08/23/excel_addin_to_work_with_unicode_csv/ – Vadzim Feb 12 '16 at 09:56
  • 1
    Google Spreadsheet does not let you choose the delimiter when exporting, so you need to stick with comma if you export a CSV. – Ivo Pereira Feb 12 '16 at 17:38
  • CSV gives you comma delimited and text gives you tab delimited. But I do believe you are correct about delimiters. – nevets1219 Feb 12 '16 at 18:47
  • 2
    What if the file size is big? Goolge sheet have a size limit around 75M if I remember correctly. I have a file is 700M – MLE Jul 26 '16 at 04:30
  • https://techlandia.com/guardar-csv-utf8-como_163552/ "Save as File Type" -> "CSV (Comma delimited)" -> "Tools" -> "Web Options" "Encoding" > "Save this document as" -> "Unicode (UTF-8)" – ztvmark Nov 10 '20 at 21:59
  • Old answer but still valid 10years later. In my case I wanted to keep semicolon separator so I pasted CSV content on GoogleDocs (because there was not this choice of separator on Sheets), exported as .txt & renamed to .csv. Works. – ClemM Dec 03 '20 at 20:57
145

I've found OpenOffice's spreadsheet application, Calc, is really good at handling CSV data.

In the "Save As..." dialog, click "Format Options" to get different encodings for CSV. LibreOffice works the same way AFAIK.

calc save dialog

aendra
  • 5,286
  • 3
  • 38
  • 57
  • I'm sure this works, I just don't happen to have OpenOffice already so Google Docs was easier in my situation. but thanks for the suggestion – Jeff Treuting Nov 19 '10 at 07:11
  • 21
    OpenOffice Calc has more and better options when opening and saving "CSV" files (field separators, encoding etc) than both Google Docs and Excel. Also, Google Docs currently suffers from a limit of 400,000 cells per spreadsheet, which OpenOffice Calc does not. – Christian Davén Sep 20 '12 at 12:43
  • 4
    I can confirm LibreOffice also works: it offers character encoding options on export that are sadly lacking in Excel. – Rupert Rawnsley Oct 28 '14 at 11:28
  • 1
    This is a great option for those working with sensitive data. The OpenOffice suite can be a portable install for those with locked down machines. – Nathan Aug 08 '17 at 17:25
122
  1. Save the Excel sheet as "Unicode Text (.txt)". The good news is that all the international characters are in UTF16 (note, not in UTF8). However, the new "*.txt" file is TAB delimited, not comma delimited, and therefore is not a true CSV.

  2. (optional) Unless you can use a TAB delimited file for import, use your favorite text editor and replace the TAB characters with commas ",".

  3. Import your *.txt file in the target application. Make sure it can accept UTF16 format.

If UTF-16 has been properly implemented with support for non-BMP code points, that you can convert a UTF-16 file to UTF-8 without losing information. I leave it to you to find your favourite method of doing so.

I use this procedure to import data from Excel to Moodle.

Flimm
  • 136,138
  • 45
  • 251
  • 267
elomage
  • 4,334
  • 2
  • 27
  • 23
  • 5
    Finally something that worked! Tried the above Excel options on Excel 2013 without success. I just switched to using \t as split character when I parsed it and it worked perfect! – Mattias Lindberg Jan 06 '14 at 08:16
  • 1
    Needed to export an XLS as CSV to import in MySQL. Using Excel 2003 I exported in format "Unicode Text (.txt)", then used Notepad++ to replace the TAB with `;`, then imported the txt-file into phpmyadmin with default "Character set of the file: utf-8", Format "CSV using LOAD DATA". All encoding was transferred correctly. – Avatar May 26 '14 at 17:29
  • For smaller files and simple data I use this shortcut: select the area or columns that I want to export, then copy (Ctrl+C). Then go to my text editor (for example, Gedit in Ubuntu) and paste. The text editor has all the data as TAB delimited. Then save as a text file, usually in UTF8. The final format may depend on your text editor configuration. – elomage May 27 '14 at 19:19
  • That's the best workaround for me! Now I can import import files via phpMyAdmin easily! – lrepolho Sep 26 '14 at 15:13
  • 3
    Thank you. This makes sense. Why MS still refuses to use UTF as a standard beats me. – Oskar Limka Jul 16 '15 at 22:28
  • Obviously (2) has a limitation - if any column has a comma (,) in it, then you have to quote that column. (Or does the international option quote all columns?) – Thomas Andrews Nov 24 '15 at 17:33
  • 4
    @OskarLimka: When you say "UTF", do you mean UTF-8 or UTF-16? Because Microsoft does use UTF-16 quite a bit. – Flimm Mar 18 '16 at 09:57
  • @Flimm, I meant UTF-8, I'm not very familiar with UTF-16 as in my work I need to have full compatibility with previous ASCII and ISO-8859-1 files. I'm not sure UTF-16 is backward compatible in this respect. – Oskar Limka Mar 24 '16 at 21:50
  • Some characters get converted to "TAB" character on the file I tried this method – Rade_303 Mar 02 '17 at 16:23
  • 9
    *"The good news is that all the international characters are in UTF16 (note, not in UTF8)."*: **complete nonsense**. UTF-8 and UTF-16 are two ways to encode the whole set of Unicode code points. –  Mar 21 '18 at 20:23
  • @Kai Noack, if you are using Notepad++ anyway, you can just open the original non utf-8 csv and change the encoding from the Encoding option on the menu bar (might save you a few steps). – James Mar 28 '18 at 01:02
  • It doesn't work in my case – user85361 Aug 12 '20 at 07:13
46

I know this is an old question but I happened to come upon this question while struggling with the same issues as the OP.

Not having found any of the offered solutions a viable option, I set out to discover if there is a way to do this just using Excel.

Fortunately, I have found that the lost character issue only happens (in my case) when saving from xlsx format to csv format. I tried saving the xlsx file to xls first, then to csv. It actually worked.

Please give it a try and see if it works for you. Good luck.

Eric
  • 469
  • 4
  • 2
  • 3
    For me, on Excel for Mac 2011, this works but only if I choose `Windows comma separated (CSV)`. It doesn't work if I use the default or DOS CSV options - both these replace the accented characters with random junk characters. Tested for characters including `é`,`è`,`â`... Don't know if it's real UTF8 but the characters aren't mangled. – user56reinstatemonica8 Jul 30 '13 at 16:28
  • 13
    Quick confirmation - the files produced with this method on (Excel for Mac 2011) **do not** produce UTF-8 csvs, **BUT**, they do produce CSVs that at least contain the correct characters and can therefore can be **painlessly converted to UTF8 in a text editor**, which is a big step forward from the ludicrous mangled junk that Excel spits out by default. – user56reinstatemonica8 Jul 30 '13 at 16:55
  • Yes, agreed, this worked for me as well (Excel Mac 2011) and it really deserves more upvotes. – cbmanica Sep 05 '13 at 00:16
  • 1
    This didn't quite work for me (using Excel 2007). I had 2 non-ASCII characters in my file and one of them was saved OK this way, the other wasn't. – EM0 Jul 18 '15 at 20:50
  • 1
    Note that the resulting CSV file will be in UTF-16, not UTF-8 as the question asked. – Flimm Mar 18 '16 at 09:58
38

You can use iconv command under Unix (also available on Windows as libiconv).

After saving as CSV under Excel in the command line put:

iconv -f cp1250 -t utf-8 file-encoded-cp1250.csv > file-encoded-utf8.csv

(remember to replace cp1250 with your encoding).

Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).

pmilewski
  • 397
  • 3
  • 3
  • 5
    That is useless if your content contains characters which can not be encoded in 1250, a better way would be to export as "Unicode .txt" in Excel and use iconv to convert from Utf16. Maybe also do a `sed` or `tr` to translate from '\t' to ',' – Sebastian Jul 05 '14 at 13:11
  • 5
    Excel's default encoding seems to be CP858 when saving as CSV or MS-DOS CSV, and Windows 1252 when saving as Windows CSV (as tested on Excel for Mac 2011). – claymation Jul 30 '14 at 18:45
  • 4
    I completely agree that this is useless, as when Excel saves in .csv format, it loses information when it comes to Unicode code points that can't be encoded in a one-byte-per-code-point encoding. – Flimm Mar 18 '16 at 09:31
36

You can do this on a modern Windows machine without third party software. This method is reliable and it will handle data that includes quoted commas, quoted tab characters, CJK characters, etc.

1. Save from Excel

In Excel, save the data to file.txt using the type Unicode Text (*.txt).

2. Start PowerShell

Run powershell from the Start menu.

3. Load the file in PowerShell

$data = Import-Csv C:\path\to\file.txt -Delimiter "`t" -Encoding BigEndianUnicode

4. Save the data as CSV

$data | Export-Csv file.csv -Encoding UTF8 -NoTypeInformation
Don Cruickshank
  • 5,641
  • 6
  • 48
  • 48
  • 3
    This method worked perfectly for a CSV file with over 15,000 records, with lines of more than the 1024 character limit imposed by Notepad. Takes seconds, and doesn't use third party software. Thanks! – absolute Feb 27 '18 at 21:08
  • Ahh, even when I tried using Google Sheets, I ran into the same problem. So maybe this PowerShell strategy would have worked. Here was my problem. For certain characters such as certain emojis, you need to use `CHARACTER SET utf8mb4` as described here: https://stackoverflow.com/a/10959780/470749 – Ryan Sep 03 '18 at 16:35
  • 1
    @Ryan That issue is specific to MySQL. I just tried the problematic character in that question and it worked fine in both Excel and PowerShell. – Don Cruickshank Sep 03 '18 at 21:51
  • Worked wonderfully! It's also possible to pipe the two commands together directly: `Import-Csv ... | Export-Csv ...` without using an intermediate `$data` variable. – Sjlver Feb 02 '21 at 10:31
26

The only "easy way" of doing this is as follows. First, realize that there is a difference between what is displayed and what is kept hidden in the Excel .csv file.

  1. Open an Excel file where you have the info (.xls, .xlsx)
  2. In Excel, choose "CSV (Comma Delimited) (*.csv) as the file type and save as that type.
  3. In NOTEPAD (found under "Programs" and then Accessories in Start menu), open the saved .csv file in Notepad
  4. Then choose -> Save As... and at the bottom of the "save as" box, there is a select box labelled as "Encoding". Select UTF-8 (do NOT use ANSI or you lose all accents etc). After selecting UTF-8, then save the file to a slightly different file name from the original.

This file is in UTF-8 and retains all characters and accents and can be imported, for example, into MySQL and other database programs.

This answer is taken from this forum.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Nick
  • 966
  • 1
  • 10
  • 20
  • 10
    This is incorrect, because of step 2, saving as CSV. The trouble with this is that Excel save the CSV file in cp1252, which is a one-byte-per-code-point encoding. This leads to loss of information for the characters which can't fit in one byte. – Flimm Mar 18 '16 at 10:10
  • It worked for me, but i cannot understand why. My file generated by javascript is always read incorrectly in excel (as a different encoding). But when i open in notepad and save as utf-8, it works fine! So, looks like there is a metadata with the encoding. How it works? If notepad can save UTF-8 CSV files and Excel can read them, it's possible in my program generate correct UTF-8 CSV files, that Excel can read? – felipeaf Aug 29 '16 at 20:17
  • Worked for me. In an XSLX converted from Access. Diacritics and LF-only and ISO-date formats are all OK. One or another did not work with more popular solutions. – RolfBly Oct 16 '17 at 15:37
22

Another one I've found useful: "Numbers" allows encoding-settings when saving as CSV.

leander
  • 229
  • 2
  • 2
  • 8
    ^ it's an application in Mac OSX – Sruit A.Suk Dec 20 '14 at 00:18
  • also be careful with Numbers as it has limitation in the number of rows, and I have converted data like this before not realizing it had cropped some of it. Excel/CSV has much higher limits. – MrE Dec 07 '17 at 23:31
16

Using Notepad++

This will fix the corrupted CSV file saved by Excel and re-save it in the proper encoding.

  • Export CSV from Excel
  • Load into Notepad++
  • Fix encoding
  • Save

Excel saves in CP-1252 / Windows-1252. Open the CSV file in Notepad++. Select

Encoding > Character Sets > Western European > Windows-1252

Then

Encoding > Convert to UTF-8
File > Save

First tell Notepad++ the encoding, then convert. Some of these other answers are converting without setting the proper encoding first, mangling the file even more. They would turn what should be into . If your character does not fit into CP-1252 then it was already lost when it was saved as CSV. Use another answer for that.

Community
  • 1
  • 1
Chloe
  • 25,162
  • 40
  • 190
  • 357
  • I may get this wrong, but you can't save the file as a ".csv" in Notepad++ and thats what this is about. – Daniel Maurer Jul 26 '18 at 08:19
  • 1
    Yes you can. You are only just saving a text file, and `.csv` is a text file. This answer will open the CSV file corrupted by Excel, fix it, then re-save it with the proper encoding. – Chloe Jul 26 '18 at 18:05
  • What you are communicating has far reaching consequences. Employing this eminent method, we can attain a higher data quality! Bravo! (Quiz: why is this comment so quizzical?) – thymaro May 26 '20 at 14:45
14

"nevets1219" is right about Google docs, however if you simply "import" the file it often does not convert it to UTF-8.

But if you import the CSV into an existing Google spreadsheet it does convert to UTF-8.

Here's a recipe:

  • On the main Docs (or Drive) screen click the "Create" button and choose "Spreadsheet"
  • From the "File" menu choose "Import"
  • Click "Choose File"
  • Choose "Replace spreadsheet"
  • Choose whichever character you are using as a Separator
  • Click "Import"
  • From the "File" menu choose "Download as" -> CSV (current sheet)

The resulting file will be in UTF-8

RedYeti
  • 1,024
  • 14
  • 28
  • 2
    The trouble with this answer is how you generated the CSV file in the first place. If you did that by simpling saving as CSV in Excel, the CSV file will be in cp1252, which is a one-byte-per-code-point encoding. This will lead to loss of information when it comes to characters that cannot fit in one byte. – Flimm Mar 18 '16 at 10:11
  • Well, that's a problem with any of these answers. And will be with any answer to how to convert to UTF-8 since there's no way to know or control what the original encoding was. – RedYeti Mar 21 '16 at 15:47
  • 1
    Some of the answers do talk about how to get round that issue, for instance, http://stackoverflow.com/a/15500052/247696 – Flimm Mar 25 '16 at 08:52
13

Under Excel 2016 and up (including Office 365), there is a CSV option dedicated to the UTF-8 format.

In Office 365, do Save As; where previously one might have chosen CSV (Comma Delimited), now one of the file types you can save as is CSV UTF-8 (Comma delimited) (*.csv)

fantabolous
  • 21,470
  • 7
  • 54
  • 51
  • 2
    You should provide some instructions on how to use that option. – rovyko Mar 15 '18 at 02:21
  • 1
    @dexgecko sure: it's called the "Save" button/menu. There are now four CSV export types in Excel 2016 on Windows: **CSV** saves as "ANSI" (more or less equivalent Latin1, but that may be different with a non-western OS install, I'm not sure), **UTF-8 CSV** saves as UTF-8 with BOM, CSV (DOS) saves as CP850 (again, mais depend on the install?) and CSV (Mac) saves as MacRoman. –  Mar 21 '18 at 20:38
  • However, Excel will only be able to read UTF-8 and ANSI CSV files (Excel can use the BOM to choose). To import from another encoding, rename as .txt, open from Excel (you have then a long list of encodings to choose from), and since it won't interpret the separator correctly, use the "convert" button to split the rows. You can also use this trick when importing a CSV from a different language convention (in french for instance, the filed separator is a semicolon, as the comma is already used as decimal separator). –  Mar 21 '18 at 20:45
  • A bit OT, but anyway: it is simpler with VBA, as you can use the **Origin** argument of [Workbook.Open](https://msdn.microsoft.com/en-us/vba/excel-vba/articles/workbooks-open-method-excel) when you open a CSV file. But this will only work with the few encodings available for CSV. One may use [Workbook.OpenText](https://msdn.microsoft.com/en-us/vba/excel-vba/articles/workbooks-opentext-method-excel), for which the **Origin** can be any encoding, alas that does not work for a CSV file encoded in UTF-8 without BOM! But if the file has the .txt extension, it works! Looks like a bug to me. –  Mar 21 '18 at 22:40
  • 3
    @Jean-ClaudeArbaut Strange, I don't see a UTF-8 CSV option in my Excel 2016. – rovyko Mar 22 '18 at 03:24
  • 1
    @dexgecko Strange, indeed. According to [this page](http://micka39.info/2016/11/17/office-2016365-excel-permet-dexporter-en-csv-avec-lencodage-utf-8/) (in french, sorry), the functionality was added in november 2016, in build version 1610. I have currently the 1802 version. However, I thought only Office 365 had that kind of evolutions, and I'm using Office Pro 2016 (not the 365 flavor). Maybe try to update your Office. –  Mar 22 '18 at 06:25
  • @dexgecko A quick check on https://products.office.com/ seems to show the professional "one time" version is not sold any longer though. I don't know what happens to those like me who have this product. Thanks for having pointed this out! –  Mar 22 '18 at 06:53
  • FWIW, This works on Excel 2016 version 1909 build 12026.20334. I'm pretty sure it did not work properly until a few weeks ago, cause I had to resort to Nick's solution above to keep diacritics intact. Possibly the fix was not in Excel itself, but in a Windows update. – RolfBly Oct 21 '19 at 13:13
9

What about using Powershell.

Get-Content 'C:\my.csv' | Out-File 'C:\my_utf8.csv' -Encoding UTF8
Michael Schau
  • 101
  • 1
  • 2
  • 1
    The text of non-ascii Spanish characters will be changed from the original Spanish character to the encode string used by ANSI. So, the text will not be the same as the original. – Jason Williams Nov 28 '17 at 15:18
8

For those looking for an entirely programmatic (or at least server-side) solution, I've had great success using catdoc's xls2csv tool.

Install catdoc:

apt-get install catdoc

Do the conversion:

xls2csv -d utf-8 file.xls > file-utf-8.csv 

This is blazing fast.

Note that it's important that you include the -d utf-8 flag, otherwise it will encode the output in the default cp1252 encoding, and you run the risk of losing information.

Note that xls2csv also only works with .xls files, it does not work with .xlsx files.

Flimm
  • 136,138
  • 45
  • 251
  • 267
mpowered
  • 13,162
  • 2
  • 15
  • 18
  • Or choose a different output encoding, can't fix everyone's problems with a single answer :-) – mpowered Mar 18 '16 at 17:40
  • 2
    I know it's unfortunate, but the fact is that this answer *will* cause problems for some users. The fact that you've only ever run this with characters that fit in cp1252 is just lucky, you provided no warning about this risk in the answer, you just said that it "worked without a hitch". You've not been asked to fix everyone's problems, just OP's, which a lot of people share. – Flimm Mar 20 '16 at 20:24
  • 1
    There you go, I fixed the issue in the answer post and removed the downvote. – Flimm Mar 20 '16 at 20:30
7

Easiest way: No need Open office and google docs

  1. Save your file as "Unicode text file";
  2. now you have an unicode text file
  3. open it with "notepad" and "Save as" it with selecting "utf-8" or other code page that you want
  4. rename file extension from "txt" to "csv". This will result in a tab-delimited UTF-8 csv file.
  5. If you want a comma-delimited file, open the csv file you just renamed and replace all tabs with commas. To do this in Notepad on Win 10, simply select one tab field then click Ctrl+H. In the window that opens, type a comma , in the "Replace with" field then click "Replace All". Save your file. The result will be a comma-delimited UTF-8 csv file.

Don't open it with MS-Office anyway!!! Now you have a tab delimited CSV file. Or, a comma-delimited one if you applied step number 5.

Randomize
  • 8,651
  • 18
  • 78
  • 133
Solivan
  • 695
  • 1
  • 8
  • 16
  • 2
    For a tab delimited file it may be better to use a `.txt` extension. `csv`, i.e. comma separated files, is just confusing. – dof1985 Apr 05 '17 at 12:42
5

As funny as it may seem, the easiest way I found to save my 180MB spreadsheet into a UTF8 CSV file was to select the cells into Excel, copy them and to paste the content of the clipboard into SublimeText.

oscaroscar
  • 1,158
  • 11
  • 16
  • 1
    This works because Excel provides a [TSV](https://en.wikipedia.org/wiki/Tab-separated_values) version of the selection via the clipboard. You can equally well use Notepad instead of SublimeText but just remember to save with UTF-8 encoding if you do! – Don Cruickshank Mar 20 '19 at 18:50
3

Assuming an Windows environment, save and work with the file as usual in Excel but then open up the saved Excel file in Gnome Gnumeric (free). Save Gnome Gnumeric's spreadsheet as CSV which - for me anyway - saves it as UTF-8 CSV.

3

Easy way to do it: download open office (here), load the spreadsheet and open the excel file (.xls or .xlsx). Then just save it as a text CSV file and a window opens asking to keep the current format or to save as a .ODF format. select "keep the current format" and in the new window select the option that works better for you, according with the language that your file is been written on. For Spanish language select Western Europe (Windows-1252/ WinLatin 1) and the file works just fine. If you select Unicode (UTF-8), it is not going to work with the spanish characters.

user35443
  • 6,309
  • 12
  • 52
  • 75
Yessus
  • 31
  • 1
3
  1. Save xls file (Excel file) as Unicode text=>file will be saved in text format (.txt)

  2. Change format from .txt to .csv (rename the file from XYX.txt to XYX.csv

Mena
  • 55
  • 1
3

I have also came across the same problem but there is an easy solution for this.

  1. Open your xlsx file in Excel 2016 or higher.
  2. In "Save As" choose this option: "(CSV UTF-8(Comma Delimited)*.csv)"

It works perfectly and a csv file is generated which can be imported in any software. I imported this csv file in my SQLITE database and it works perfectly with all unicode characters intact.

BSMP
  • 4,596
  • 8
  • 33
  • 44
Krish
  • 379
  • 2
  • 8
3

I was not able to find a VBA solution for this problem on Mac Excel. There simply seemed to be no way to output UTF-8 text.

So I finally had to give up on VBA, bit the bullet, and learned AppleScript. It wasn't nearly as bad as I had thought.

Solution is described here: http://talesoftech.blogspot.com/2011/05/excel-on-mac-goodbye-vba-hello.html

anroy
  • 73
  • 6
2

Came across the same problem and googled out this post. None of the above worked for me. At last I converted my Unicode .xls to .xml (choose Save as ... XML Spreadsheet 2003) and it produced the correct character. Then I wrote code to parse the xml and extracted content for my use.

2

I have written a small Python script that can export worksheets in UTF-8.

You just have to provide the Excel file as first parameter followed by the sheets that you would like to export. If you do not provide the sheets, the script will export all worksheets that are present in the Excel file.

#!/usr/bin/env python

# export data sheets from xlsx to csv

from openpyxl import load_workbook
import csv
from os import sys

reload(sys)
sys.setdefaultencoding('utf-8')

def get_all_sheets(excel_file):
    sheets = []
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    all_worksheets = workbook.get_sheet_names()
    for worksheet_name in all_worksheets:
        sheets.append(worksheet_name)
    return sheets

def csv_from_excel(excel_file, sheets):
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    for worksheet_name in sheets:
        print("Export " + worksheet_name + " ...")

        try:
            worksheet = workbook.get_sheet_by_name(worksheet_name)
        except KeyError:
            print("Could not find " + worksheet_name)
            sys.exit(1)

        your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
        wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
        for row in worksheet.iter_rows():
            lrow = []
            for cell in row:
                lrow.append(cell.value)
            wr.writerow(lrow)
        print(" ... done")
    your_csv_file.close()

if not 2 <= len(sys.argv) <= 3:
    print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
    sys.exit(1)
else:
    sheets = []
    if len(sys.argv) == 3:
        sheets = list(sys.argv[2].split(','))
    else:
        sheets = get_all_sheets(sys.argv[1])
    assert(sheets != None and len(sheets) > 0)
    csv_from_excel(sys.argv[1], sheets)
Scarabee
  • 5,437
  • 5
  • 29
  • 55
Julian
  • 1,694
  • 22
  • 29
  • I have updated the implementation a bit and created a gist https://gist.github.com/julianthome/2d8546e7bed869079ab0f409ae0faa87 – Julian Feb 06 '17 at 11:48
2

Excel typically saves a csv file as ANSI encoding instead of utf8.

One option to correct the file is to use Notepad or Notepad++:

  1. Open the .csv with Notepad or Notepad++.
  2. Copy the contents to your computer clipboard.
  3. Delete the contents from the file.
  4. Change the encoding of the file to utf8.
  5. Paste the contents back from the clipboard.
  6. Save the file.
Jason Williams
  • 2,740
  • 28
  • 36
  • Not sure about old versions of NP++, but in the current version you can just select Encoding > Convert to UTF-8. replaces steps 2-5 – Felk Mar 06 '18 at 12:57
1

Encoding -> Convert to Ansi will encode it in ANSI/UNICODE. Utf8 is a subset of Unicode. Perhaps in ANSI will be encoded correctly, but here we are talking about UTF8, @SequenceDigitale.

There are faster ways, like exporting as csv ( comma delimited ) and then, opening that csv with Notepad++ ( free ), then Encoding > Convert to UTF8. But only if you have to do this once per file. If you need to change and export fequently, then the best is LibreOffice or GDocs solution.

malenkiy_scot
  • 16,415
  • 6
  • 64
  • 87
Lucas
  • 27
  • 1
  • 5
    "Utf8 is a subset of Unicode": this makes no sense. UTF-8 is an *encoding* of Unicode. – jameshfisher Mar 10 '14 at 18:47
  • I'm not sure I can find this "Encoding -> Convert to Ansi" option. – Flimm Mar 18 '16 at 09:43
  • Simply a ANSI csv in Notepad++ and changing the encoding of the file to utf8 causes the original non-ascii Spanish characters to be converted into encode strings which do not match the original text. – Jason Williams Nov 28 '17 at 15:22
1

Microsoft Excel has an option to export spreadsheet using Unicode encoding. See following screenshot.

enter image description here

vladaman
  • 3,741
  • 2
  • 29
  • 26
  • 14
    "Unicode Text" saves in UTF-16 LE (Little Endian), not in UTF-8 as asked by the OP. – ax. Jul 18 '12 at 07:22
  • Yes, but it is the best way to get Unicode support for your x-separated-values in Excel. I've had all sorts of issues trying to get Excel to play ball with UTF-8! [Read More](http://www.herongyang.com/Unicode/Excel-Using-Microsoft-Excel-as-Unicode-Text-Editor.html) – mcNux Apr 24 '13 at 16:15
1

open .csv fine with notepad++. if you see your encoding is good (you see all characters as they should be) press encoding , then convert to ANSI else - find out what is your current encoding

Marius Gri
  • 19
  • 1
  • that worked for me.. had problem with greek characters when exportedmysql db as csv and imported it to excel.. – nikolas May 24 '13 at 09:47
  • 1
    The problem with this is how you generate the CSV file in the first place. If you simply save as CSV file in Excel, it will save it in cp1252, which is a one-byte-per-code-point encoding, and therefore loses information. – Flimm Mar 18 '16 at 10:01
1

another solution is to open the file by winword and save it as txt and then reopen it by excel and it will work ISA

1

Save Dialog > Tools Button > Web Options > Encoding Tab

Elia Weiss
  • 8,324
  • 13
  • 70
  • 110
1

A second option to "nevets1219" is to open your CSV file in Notepad++ and do a convertion to ANSI.

Choose in the top menu : Encoding -> Convert to Ansi

SequenceDigitale.com
  • 4,038
  • 1
  • 24
  • 23
  • No idea why you were downvoted. Notepad++ did it for me. Can't store my file in Google Spreadsheet as it is confidential. – Zane Mar 28 '13 at 17:55
  • 3
    The trouble with this answer is how you generate the CSV file in the first place. If you simply save as CSV from Excel, the encoding will be cp1252, which is a one-byte-per-code-point encoding, and therefore will lose information for characters that don't fit into that. Also, at the end, you should convert to UTF-8, and not to Ansi, if you want to do what the question asked for. – Flimm Mar 18 '16 at 10:02
  • This is by far the easier answer IMO. I'm using Excel 2016 and found it save to ANSI encoding by default, but got it into UTF-8 which is what I wanted. – rovyko Mar 15 '18 at 02:19
0

I have the same problem and come across this add in , and it works perfectly fine in excel 2013 beside excel 2007 and 2010 which it is mention for.

academic.user
  • 639
  • 2
  • 9
  • 28
0

I needed to automate this process on my Mac. I originally tried using catdoc/xls2csv as suggested by mpowered, but xls2csv had trouble detecting the original encoding of the document and not all documents were the same. What I ended up doing was setting the default webpage output encoding to be UTF-8 and then providing the files to Apple's Automator, applying the Convert Format of Excel Files action to convert to Web Page (HTML). Then using PHP, DOMDocument and XPath, I queried the documents and formatted them to CSV.

This is the PHP script (process.php):

<?php
$pi = pathinfo($argv[1]);
$file = $pi['dirname'] . '/' . $pi['filename'] . '.csv';
$fp = fopen($file,'w+');
$doc = new DOMDocument;
$doc->loadHTMLFile($argv[1]);
$xpath = new DOMXPath($doc);
$table = [];
foreach($xpath->query('//tr') as $row){
    $_r = [];
    foreach($xpath->query('td',$row) as $col){
        $_r[] = trim($col->textContent);
    }
    fputcsv($fp,$_r);
}
fclose($fp);
?>

And this is the shell command I used to convert the HTML documents to csv:

find . -name '*.htm' | xargs -I{} php ./process.php {}

This is a really, really roundabout way of doing this, but it was the most reliable method that I found.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Kyle
  • 3,935
  • 2
  • 30
  • 44
-1

Another way is to open the UTF-8 CSV file in Notepad where it will be displayed correctly. Then replace all the "," with tabs. Paste all of this into a new excel file.

Martin
  • 25
  • 1
  • 5
    The question is about converting an Excel file to a UTF-8 encoded CSV file. This answer starts with a UTF-8 encoded CSV file! – Flimm Mar 18 '16 at 09:41
-1

And for those who have sublime text: save with encoding utf-16 LE with BOM should do it ;-)

manuchap
  • 117
  • 1
  • 5
-1

(On a Mac:) From Excel save as CSV file. Open the CSV file in TextWrangler (it’s free) and use "save as". In the save dialog choose Unicode (UTF-8). Done

(I guess you can do this with TextEdit also - if you play with the open and save settings. Try Open files: auto, Save files: UTF-8)

eye-wonder
  • 1,181
  • 9
  • 17
  • The trouble with this solution is that it is lossy. If you have code points that don't fit in a one-byte-per-code-point encoding, then those characters will get lost in the transition. – Flimm Mar 18 '16 at 09:45
-3

You can save excel as unicode text, it is tab-delimited.

Marko
  • 20,385
  • 13
  • 48
  • 64
  • It is an answer in the sense that it results in a tab-delimited (rather than comma-delimited) file, which is something most people (including me) can work with. – jogojapan Aug 31 '15 at 07:22
  • 2
    "Unicode text" in Excel means UTF-16, and not UTF-8, which is what the question asked for. – Flimm Mar 18 '16 at 09:48
  • This is exactly what worked for me. The question is, for most people, not whether the file is encoded in UTF-8 but whether the 'special' characters are preserved. The fact that Excel doesn't do this, in 2016, forces us to jump through hoops. – Henry Rusted Dec 08 '16 at 12:35
-3

I used the following solution: Mac Exel 2008 > file > Save-as and then under format use MS_DOS Comma Separated (.csv). Worked perfect.

kikaDESIGN
  • 35
  • 1