How to save an Excel worksheet as CSV

Question

I want to write a Python script that reads in an Excel spreadsheet and saves some of its worksheets as CSV files.

How can I do this?

I have found third-party modules for reading and writing Excel files from Python, but as far as I can tell, they can only save files in Excel (i.e. *.xls) format. If I'm wrong here, some example code showing how to do what I'm trying to do with these modules would be appreciated.

I also came across one solution that I can't quite understand, but seems to be Windows-specific, and therefore would not help me anyway, since I want to do this in Unix. At any rate, it's not clear to me that this solution can be extended to do what I want to do, even under Windows.

score 64 · Accepted Answer · edited May 23 '21 at 04:16

The most basic examples using the two libraries described line by line:

Open the xls workbook
Reference the first spreadsheet
Open in binary write the target csv file
Create the default csv writer object
Loop over all the rows of the first spreadsheet
Dump the rows into the csv

import xlrd
import csv

with xlrd.open_workbook('a_file.xls') as wb:
    sh = wb.sheet_by_index(0)  # or wb.sheet_by_name('name_of_the_sheet_here')
    with open('a_file.csv', 'wb') as f:   # open('a_file.csv', 'w', newline="") for python 3
        c = csv.writer(f)
        for r in range(sh.nrows):
            c.writerow(sh.row_values(r))

import openpyxl
import csv

wb = openpyxl.load_workbook('test.xlsx')
sh = wb.active
with open('test.csv', 'wb') as f:  # open('test.csv', 'w', newline="") for python 3
    c = csv.writer(f)
    for r in sh.rows:
        c.writerow([cell.value for cell in r])

To evaluate the Excel formulas with `openpyxl` : `wb = openpyxl.load_workbook('test.xlsx', data_only=True)` — Leonid, Dec 02 '21 at 11:44
@Zeugma how can I write this csv back to a folder? (In my case aws s3) I keep getting AttributeError: '_io.TextIOWrapper' object has no attribute 'save' — pdangelo4, Jan 19 '22 at 00:34

score 18 · Answer 2 · edited May 17 '23 at 11:53

18

Using pandas will be a bit shorter:

import pandas as pd

df = pd.read_excel('my_file', sheet_name='my_sheet_name')  # sheet_name is optional
df.to_csv('output_file_name', index=False)  # index=False prevents pandas from writing a row index to the CSV.

# oneliner
pd.read_excel('my_file', sheetname='my_sheet_name').to_csv('output_file_name', index=False)

edited May 17 '23 at 11:53

zachaysan

1,726
16
32

answered Jul 25 '17 at 08:09

FabienP

3,018
1
20
25

2

I don't trust pandas to do this. It's been converting all my leading zeros. – rrs Aug 24 '18 at 21:48
Can you give some more details? – FabienP Aug 25 '18 at 06:44
2

This implementation works perfectly fine for my scenario, just change `sheetname` to `sheet_name` as it's a typo. – Keivan Ipchi Hagh Apr 11 '21 at 07:48
@rrs why not use `pd.read_excel('my_file', dtype=str)` ? more info [here](https://stackoverflow.com/a/41662442/3517025) – Joey Baruch Aug 11 '21 at 14:48

jtlz2 · Answer 3 · 2021-12-02T11:06:56.223

16

As of December 2021 and Python 3:

The openpyxl API has changed sufficiently (see https://openpyxl.readthedocs.io/en/stable/usage.html) that I have updated this part of the answer by @Boud (now @Zeugma?), as follows:

import openpyxl
import csv

wb = openpyxl.load_workbook('test.xlsx')
sh = wb.active # was .get_active_sheet()
with open('test.csv', 'w', newline="") as file_handle:
    csv_writer = csv.writer(file_handle)
    for row in sh.iter_rows(): # generator; was sh.rows
        csv_writer.writerow([cell.value for cell in row])

@Leonid made some helpful comments - in particular:

csv.writer provides some additional options e.g. custom delimiter:

csv_writer = csv.writer(fout, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)

HTH

edited Dec 02 '21 at 11:06

answered Sep 28 '20 at 09:32

jtlz2

7,700
9
64
114

2

A couple of typos here. The "with" needs "as f" on the end, and "sh.iter_rows" should be "sh.iter_rows()" Otherwise, works well, thanks! – eakst7 May 12 '21 at 15:44
1

@eakst7 Huge thanks - can you believe I typed it out - now fixed - glad it helped. – jtlz2 May 12 '21 at 18:00
1

Thanks, that was useful. Two comments from me: 1. `pylama` does not like single-letter variable names and the call to `csv.writer` provides additional options (such as custom delimiter) which would be cool to highlight. For example: `csv_writer = csv.writer(fout, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)` – Leonid Dec 02 '21 at 10:44
1

@Leonid Thanks so much - updated as per your helpful comments! – jtlz2 Dec 02 '21 at 11:00

Charles Duffy · Answer 4 · 2012-05-29T16:34:02.643

5

Use the xlrd or openpyxlmodule to read xls or xlsx documents respectively, and the csv module to write.

Alternately, if using Jython, you can use the Apache POI library to read either .xls or .xlsx, and the native CSV module will still be available.

edited May 29 '12 at 16:34

answered May 29 '12 at 15:47

Charles Duffy

280,126
43
390
441

3

And if you need to read .xlsx files use [openpyxl](http://packages.python.org/openpyxl/). – Steven Rumbalski May 29 '12 at 16:02
I prefer [`xlsxrd`](https://groups.google.com/forum/#!msg/python-excel/J4pKxEWGOd0/-Nm23guVH84J) to read `.xlsx` files. At some point, it will be merged into `xlrd`. – John Y May 29 '12 at 16:34

score 0 · Answer 5 · edited Aug 01 '19 at 18:57

First read your Excel spreadsheet into Pandas. The code below will import your Excel spreadsheet into Pandas as an OrderedDict which contains all of your worksheets as DataFrames. Then, simply use the worksheet_name as a key to access specific worksheet as a DataFrame and save only the required worksheet as a csv file by using df.to_csv(). Hope this will work in your case.

import pandas as pd
df = pd.read_excel('YourExcel.xlsx', sheet_name=None)
df['worksheet_name'].to_csv('output.csv')

How to save an Excel worksheet as CSV

5 Answers5

Linked