32

I want to write a Python script that reads in an Excel spreadsheet and saves some of its worksheets as CSV files.

How can I do this?

I have found third-party modules for reading and writing Excel files from Python, but as far as I can tell, they can only save files in Excel (i.e. *.xls) format. If I'm wrong here, some example code showing how to do what I'm trying to do with these modules would be appreciated.

I also came across one solution that I can't quite understand, but seems to be Windows-specific, and therefore would not help me anyway, since I want to do this in Unix. At any rate, it's not clear to me that this solution can be extended to do what I want to do, even under Windows.

Mel
  • 5,837
  • 10
  • 37
  • 42
kjo
  • 33,683
  • 52
  • 148
  • 265

5 Answers5

64

The most basic examples using the two libraries described line by line:

  1. Open the xls workbook
  2. Reference the first spreadsheet
  3. Open in binary write the target csv file
  4. Create the default csv writer object
  5. Loop over all the rows of the first spreadsheet
  6. Dump the rows into the csv

import xlrd
import csv

with xlrd.open_workbook('a_file.xls') as wb:
    sh = wb.sheet_by_index(0)  # or wb.sheet_by_name('name_of_the_sheet_here')
    with open('a_file.csv', 'wb') as f:   # open('a_file.csv', 'w', newline="") for python 3
        c = csv.writer(f)
        for r in range(sh.nrows):
            c.writerow(sh.row_values(r))

import openpyxl
import csv

wb = openpyxl.load_workbook('test.xlsx')
sh = wb.active
with open('test.csv', 'wb') as f:  # open('test.csv', 'w', newline="") for python 3
    c = csv.writer(f)
    for r in sh.rows:
        c.writerow([cell.value for cell in r])
Sayyor Y
  • 1,130
  • 2
  • 14
  • 27
Zeugma
  • 31,231
  • 9
  • 69
  • 81
  • 1
    To evaluate the Excel formulas with `openpyxl` : `wb = openpyxl.load_workbook('test.xlsx', data_only=True)` – Leonid Dec 02 '21 at 11:44
  • @Zeugma how can I write this csv back to a folder? (In my case aws s3) I keep getting AttributeError: '_io.TextIOWrapper' object has no attribute 'save' – pdangelo4 Jan 19 '22 at 00:34
18

Using pandas will be a bit shorter:

import pandas as pd

df = pd.read_excel('my_file', sheet_name='my_sheet_name')  # sheet_name is optional
df.to_csv('output_file_name', index=False)  # index=False prevents pandas from writing a row index to the CSV.

# oneliner
pd.read_excel('my_file', sheetname='my_sheet_name').to_csv('output_file_name', index=False)
zachaysan
  • 1,726
  • 16
  • 32
FabienP
  • 3,018
  • 1
  • 20
  • 25
16

As of December 2021 and Python 3:

The openpyxl API has changed sufficiently (see https://openpyxl.readthedocs.io/en/stable/usage.html) that I have updated this part of the answer by @Boud (now @Zeugma?), as follows:

import openpyxl
import csv

wb = openpyxl.load_workbook('test.xlsx')
sh = wb.active # was .get_active_sheet()
with open('test.csv', 'w', newline="") as file_handle:
    csv_writer = csv.writer(file_handle)
    for row in sh.iter_rows(): # generator; was sh.rows
        csv_writer.writerow([cell.value for cell in row])

@Leonid made some helpful comments - in particular:

csv.writer provides some additional options e.g. custom delimiter:

csv_writer = csv.writer(fout, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)

HTH

jtlz2
  • 7,700
  • 9
  • 64
  • 114
  • 2
    A couple of typos here. The "with" needs "as f" on the end, and "sh.iter_rows" should be "sh.iter_rows()" Otherwise, works well, thanks! – eakst7 May 12 '21 at 15:44
  • 1
    @eakst7 Huge thanks - can you believe I typed it out - now fixed - glad it helped. – jtlz2 May 12 '21 at 18:00
  • 1
    Thanks, that was useful. Two comments from me: 1. `pylama` does not like single-letter variable names and the call to `csv.writer` provides additional options (such as custom delimiter) which would be cool to highlight. For example: `csv_writer = csv.writer(fout, delimiter='|', quotechar='"', quoting=csv.QUOTE_MINIMAL)` – Leonid Dec 02 '21 at 10:44
  • 1
    @Leonid Thanks so much - updated as per your helpful comments! – jtlz2 Dec 02 '21 at 11:00
5

Use the xlrd or openpyxlmodule to read xls or xlsx documents respectively, and the csv module to write.

Alternately, if using Jython, you can use the Apache POI library to read either .xls or .xlsx, and the native CSV module will still be available.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 3
    And if you need to read .xlsx files use [openpyxl](http://packages.python.org/openpyxl/). – Steven Rumbalski May 29 '12 at 16:02
  • I prefer [`xlsxrd`](https://groups.google.com/forum/#!msg/python-excel/J4pKxEWGOd0/-Nm23guVH84J) to read `.xlsx` files. At some point, it will be merged into `xlrd`. – John Y May 29 '12 at 16:34
0

First read your Excel spreadsheet into Pandas. The code below will import your Excel spreadsheet into Pandas as an OrderedDict which contains all of your worksheets as DataFrames. Then, simply use the worksheet_name as a key to access specific worksheet as a DataFrame and save only the required worksheet as a csv file by using df.to_csv(). Hope this will work in your case.

import pandas as pd
df = pd.read_excel('YourExcel.xlsx', sheet_name=None)
df['worksheet_name'].to_csv('output.csv')  
akshayk07
  • 2,092
  • 1
  • 20
  • 32
Ashu007
  • 745
  • 1
  • 9
  • 13