367

I'm looking for a way to convert xlsx files to csv files on Linux.

I do not want to use PHP/Perl or anything like that since I'm looking at processing several millions of lines, so I need something quick. I found a program on the Ubuntu repos called xls2csv but it will only convert xls (Office 2003) files (which I'm currently using) but I need support for the newer Excel files.

Any ideas?

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
user1390150
  • 3,679
  • 3
  • 14
  • 3
  • 11
    Thinking that anything implemented with a scripting language is going to be slow by nature seems... a little misguided, particularly since the interesting libraries in those languages tend to have backends written in C. – Charles Duffy May 11 '12 at 19:34
  • 2
    Excel used to be limited to 65536 rows. Now it's 1,048,576 (http://support.microsoft.com/kb/120596). it's going to be tough to fit "sever millions of lines" in it. just saying... – Pavel Veller May 11 '12 at 19:35
  • 1
    @Pavel could be over several files. – Charles Duffy May 11 '12 at 19:38
  • 2
    ...personally, I'd do this using the xlsv library for Python, but since scripting-based approaches are described as out of the question... *shrug*. (How is it a programming question if programmatic tools are excluded from the answer?) – Charles Duffy May 11 '12 at 19:39
  • 1
    @CharlesDuffy I'm currently using a PHP library to do this, and what takes xls2csv 1 second to do, takes php 10 minutes to do. Literally. – user1390150 May 11 '12 at 19:41
  • @user1390150 I believe it -- PHP is a raving pile of... well. But just because you have a really bad PHP library isn't a valid reason to write off all interpreted languages. – Charles Duffy May 11 '12 at 20:07
  • 1
    (err, that library name should have been xslw, not xlsv) – Charles Duffy May 11 '12 at 20:08
  • Related: https://unix.stackexchange.com/questions/23726/convert-a-xlsx-ms-excel-file-to-csv-on-command-line-with-semicolon-separated – Ciro Santilli OurBigBook.com Dec 14 '20 at 10:00

12 Answers12

334

The Gnumeric spreadsheet application comes with a command line utility called ssconvert that can convert between a variety of spreadsheet formats:

$ ssconvert Book1.xlsx newfile.csv

Using exporter Gnumeric_stf:stf_csv

$ cat newfile.csv

Foo,Bar,Baz
1,2,3
123.6,7.89,
2012/05/14,,
The,last,Line

To install on Ubuntu:

apt-get install gnumeric

To install on Mac:

brew install gnumeric
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jmcnamara
  • 38,196
  • 6
  • 90
  • 108
  • 25
    Really the most hassle-free method of converting spreadsheets. Combined with a bash script, it will let you batch-process multiple files. `for f in *.csv; do ssconvert "$f" "${f%.csv}.xlsx"; done` The LibreOffice method could probably process other formats, but I could not make it work (it would simply open a blank file every time, even with the `--headless` argument). – sleblanc Aug 15 '13 at 17:24
  • 9
    @sebleblanc Not quite hassle-free. The installation is a pain given the number of dependencies (if you're doing this on a headless server). So far gcc, intltool, zlib-devel, GTK... GTK requires glib, atk, pango, cairo, cairo-object, gdk-pixbuf-2.0... – andrewtweber Feb 14 '14 at 18:03
  • It seems that there is now way to prevent ssconvert to recompute the sheet before converting it to csv... See http://stackoverflow.com/questions/22344918/how-to-prevent-ssconvert-to-recalculate-the-excel-file-before-conversion?noredirect=1#comment33963480_22344918 – RockScience Mar 12 '14 at 10:43
  • 15
    I managed to install it on a headless debian server with `apt-get install gnumeric --no-install-recommends`. The only drawback is that it fires lots of warnings **GConf-WARNING **: Client failed to connect to the D-BUS daemon** when running. A simple `ssconvert oldfile.xlsx newfile.csv > /dev/null 2>&1` will do the trick. – Benjamin Delichere Mar 18 '14 at 11:05
  • im not finding ssconvert for cygwin: http://screencast.com/t/4GgSOJNLa http://screencast.com/t/seIb1GHKOI – Alex Gordon Mar 23 '14 at 04:34
  • i also tried installing gnumeric on windows and still nothing http://screencast.com/t/4QcjZVXHoh ... its a joke i guess? – Alex Gordon Mar 23 '14 at 04:40
  • 2
    @Yuck - Not sure of what your screen-shot wants to illustrate (it only shows that you don't have the binary in your cygwin PATH) but I've just tried native `gnumeric-1.12.17-20140610.exe` (no cygwin) and it works flawlessly. – Álvaro González Aug 19 '14 at 10:45
  • 11
    To write to csv you may want the `-S` flag to write multiple sheets. Each goes to its own file. – Ed Avis Feb 25 '15 at 15:37
  • I've tried using this in Ubuntu 12.04 but get the following error: {ssconvert test.xlsb newfile.csv Using exporter Gnumeric_stf:stf_csv ** (ssconvert:4973): WARNING **: Format Gnumeric_Excel:xlsx's probe changed input ref_count from 1 to 3. ** (ssconvert:4973): WARNING **: Format Gnumeric_Excel:xlsx's probe changed input ref_count from 3 to 5. E Unsupported file format.} Any ideas? – 2one Dec 06 '15 at 20:36
  • 1
    How can you specify the separator? `$ ssconvert -O 'separator=;' file.csv file.xlsx` or `$ ssconvert -O 'separator=; format=raw' file.csv file.xlsx` do not work. – hhh Jan 26 '17 at 10:35
  • 7
    @hhh The separator option only works with txt export type. You can use this to print to stdout: `ssconvert -O "separator=;" -T Gnumeric_stf:stf_assistant file.xlsx fd://1`. – exic Sep 05 '17 at 10:52
  • We should probably accept this answer as the recommended solution. – Pramit Aug 19 '19 at 02:29
  • Under macOS I had to use like this: `ssconvert --export-type=Gnumeric_stf:stf_csv Book1.xlsx newfile.csv` – neowinston Apr 21 '20 at 14:03
  • 1
    The bash script of @sleblanc works fine, but does the inverse of what is asked for. – ksyrium Jun 17 '22 at 08:15
  • 1
    @ksyrium oopsies! – sleblanc Jun 18 '22 at 14:34
  • oh i just wished ssconvert was installable as a separate app! I was just trying to do this on wsl2 debian, and i cant do this, dont want all those gnome dependencies to install for my little work. A little python script will be best for this. – user734028 Aug 12 '22 at 06:05
181

You can do this with LibreOffice:

libreoffice --headless --convert-to csv $filename --outdir $outdir

For reasons not clear to me, you might need to run this with sudo. You can make LibreOffice work with sudo without requiring a password by adding this line to you sudoers file:

users ALL=(ALL) NOPASSWD: libreoffice
spiffytech
  • 6,161
  • 7
  • 41
  • 57
  • 38
    how would I tell libreoffice that I want the second sheet? – dmeu May 08 '13 at 07:30
  • 35
    Allowing sudo to libreoffice for everyone without password is opening a can of worms. Please beware of the consequences, including the possibility to acquiring root permissions on a multi-user platform – Interarticle Aug 01 '13 at 07:42
  • 6
    this worked for me (sudo not required). My version: libreoffice-calc-3.6.7.2-4.fc18.x86_64 – Brad Hein Jan 08 '14 at 16:32
  • 2
    I just tried and first I needed sudo. Then I is because my current user had an instance of libreoffice running (playing an ods). After closing it, no sudo required. – Rémi Feb 06 '14 at 21:48
  • 8
    `/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to csv $filename` worked on OS X for me. – Nobu Jun 10 '14 at 20:52
  • This method is good but AFAIK it will break non-ascii characters – Konstantin V. Salikhov Aug 13 '14 at 08:46
  • 3
    Make sure all libreoffice instances are closed otherwise it won't work. – Umair A. Dec 31 '14 at 11:10
  • 1
    `libreoffice --convert-to` works well to convert between one spreadsheet format and another (I use it to read .xlsb files, by converting them to .xls first). But for writing to CSV, it is limited to outputting the first sheet only. – Ed Avis Feb 25 '15 at 14:01
  • 16
    To convert to utf-8, preserving non-ascii characters, use instead `--convert-to "csv:Text - txt - csv (StarCalc):44,34,76,1,1/1"`. See [open office wiki](https://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options) for details. – Aryeh Leib Taurog Jul 14 '15 at 17:30
  • How do I read multiple sheets? I.e. i want to read sheet 7 only and create a .csv. – 2one Dec 06 '15 at 20:44
  • 2
    @dmeu you don't. You use another tool like `xlsx2csv` if you need that. The `xlsx2csv` tool has the `-s` or `--sheet` option which you can use to select the sheet (0 stands for "all sheets" and the default is 1). `xlsx2csv` is packaged in popular Linux distributions like Debian, Ubuntu and Arch Linux. – josch May 04 '16 at 06:01
  • @dmeu: use [this script](https://github.com/colonelqubit/libreconverter). Works like a charm: `./libreconverter.py Spreadsheet.xls:"Sheet Name" output.csv`. – Benoit Duffez Dec 16 '16 at 10:24
  • I had encoding problems in the csv after using this answer, the `ssconvert` answer solved the issue. – yishaiz Mar 14 '17 at 13:10
  • There is no reason for libreoffice to require root shell, and it could be easily debugged, where it fails with a user account. – peterh May 13 '19 at 12:59
179

If you already have a desktop environment then I'm sure Gnumeric or LibreOffice would work well, but on a headless server (e.g. any cloud-based environment), they require dozens of dependencies that you also need to install.

I found this Python alternative: xlsx2csv

easy_install xlsx2csv
xlsx2csv file.xlsx > newfile.csv

It took two seconds to install and works like a charm.

If you have multiple sheets, you can export all at once, or one at a time:

xlsx2csv file.xlsx --all > all.csv
xlsx2csv file.xlsx --all -p '' > all-no-delimiter.csv
xlsx2csv file.xlsx -s 1 > sheet1.csv

He also links to several alternatives built in Bash, Python, Ruby, and Java.

andrewtweber
  • 24,520
  • 22
  • 88
  • 110
  • Works great, but I can get to run only as sudo (`IOError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/prettytable-0.7.2-py2.7.egg/EGG-INFO/top_level.txt'`). Now that I think about it, I got the same error with `csvkit`. – user2105469 May 28 '14 at 17:24
  • 2
    ....Was working great for me and allowing the extraction of each sheet to individual files using the -s option -- where libreoffice was not able to handle the size of the sheet, xlsx2csv had no problems – Soren May 29 '14 at 18:20
  • Thanks! Very convenient in ubuntu. – zhuguowei Nov 16 '15 at 07:01
  • 12
    In Debian and Ubuntu there is the `xlsx2csv` package, so you don't need to manually install it through `easy_install` but can use your package manager. – josch May 04 '16 at 06:04
  • 1
    On MacOS you will need a `sudo easy_install xlsx2csv` – Frank Hintsch Jan 07 '20 at 17:16
  • seems there is no `brew` formula for this, so I used `csvkit` from the next answer – WestCoastProjects Nov 28 '22 at 04:28
  • xls2csv gave mixxed up a lot of column values in my case.. – Stephan Jan 11 '23 at 15:32
  • 2
    I have no idea how robust or feature-complete `xlsx2csv` is but it seems to be actively maintained and compared to installing Gnumeric on macOS via Homebrew which involves more than 30 dependencies and LibreOffice which is a several hundred MB download `xlsx2csv` has zero(!) dependencies, comes at just 50 KB and worked perfectly for my use case (converting the output of PaddleOCR to csv). Either install it with `pip install xlsx2csv` or download the latest [release](https://github.com/dilshod/xlsx2csv/releases) from the [Repository](https://github.com/dilshod/xlsx2csv) and run `xlsx2csv.py`. – Stefan Schmidt Apr 12 '23 at 22:51
60

Use csvkit:

in2csv data.xlsx > data.csv

For details, check their excellent documentation.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Holger Brandl
  • 10,634
  • 3
  • 64
  • 63
50

In Bash, I used this LibreOffice command (executable libreoffice) to convert all my .xlsx files in the current directory:

for i  in *.xlsx; do  libreoffice --headless --convert-to csv "$i" ; done

Close all your LibreOffice open instances before executing, or it will fail silently.

The command takes care of spaces in the filename.

I tried it again some years later, and it didn't work. This question gives some tips, but the quickest solution was to run as root (or running a sudo libreoffice). It is not elegant, but quick.

Use the command scalc.exe in Windows.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
neves
  • 33,186
  • 27
  • 159
  • 192
13

Another option would be to use R via a small Bash wrapper for convenience:

xlsx2txt(){
echo '
require(xlsx)
write.table(read.xlsx2(commandArgs(TRUE)[1], 1), stdout(), quote=F, row.names=FALSE, col.names=T, sep="\t")
' | Rscript --vanilla - $1 2>/dev/null
}

xlsx2txt file.xlsx > file.txt
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Holger Brandl
  • 10,634
  • 3
  • 64
  • 63
9

If the .xlsx file has many sheets, the -s flag can be used to get the sheet you want. For example:

xlsx2csv "my_file.xlsx" -s 2 second_sheet.csv

second_sheet.csv would contain the data of the second sheet in my_file.xlsx.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Akavall
  • 82,592
  • 51
  • 207
  • 251
7

Using the Gnumeric spreadsheet application which comes which a commandline utility called ssconvert is indeed super simple:

find . -name '*.xlsx' -exec ssconvert -T Gnumeric_stf:stf_csv {} \;

and you're done!

4

If you are OK to run Java command line then you can do it with Apache POI HSSF's Excel Extractor. It has a main method that says to be the command line extractor. This one seems to just dump everything out. They point out to this example that converts to CSV. You would have to compile it before you can run it but it too has a main method so you should not have to do much coding per se to make it work.

Another option that might fly but will require some work on the other end is to make your Excel files come to you as Excel XML Data or XML Spreadsheet of whatever MS calls that format these days. It will open a whole new world of opportunities for you to slice and dice it the way you want.

Pavel Veller
  • 6,085
  • 1
  • 26
  • 24
4

You can use executable libreoffice to convert your .xlsx files to csv:

libreoffice --headless --convert-to csv ABC.xlsx

Argument --headless indicates that we don't need GUI.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Udesh
  • 2,415
  • 2
  • 22
  • 32
3

As others said, executable libreoffice can convert Excel files (.xls) files to CSV. The problem for me was the sheet selection.

This LibreOffice Python script does a fine job at converting a single sheet to CSV.

Usage is:

./libreconverter.py File.xls:"Sheet Name" output.csv

The only downside (on my end) is that --headless doesn't seem to work. I have a LibreOffice window that shows up for a second and then quits.

That's OK with me; it's the only tool that does the job rapidly.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Benoit Duffez
  • 11,839
  • 12
  • 77
  • 125
0

You can use script getsheets.py. Add dependencies first:

pip3 install pandas xlrd openpyxl

Then call the script: python3 getsheets.py <file.xlsx>

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
kaiya
  • 271
  • 1
  • 3
  • 16