0

I need to parse the first column of a document into a list ['item1', 'item2', ...], and this document can be:

  • a TXT document (items separated by newlines)
  • a CSV document with one single column (then it's similar to TXT document)
  • a CSV with many columns, separated by ;
  • a CSV with many columns, separated by ,
  • a XLS with one or many columns
  • a XLSX with one or many columns

I was about to code it with many cases:

ext = os.path.splitext(f)[1].lower()
if ext == '.txt':
    with open(f, 'r') as f:
        L = f.read().splitlines()
if ext == '.csv':
    reader = csv.reader(...)
    ...
if ext == '.xls':
    ...

but is there a general higher-level tool in Python that does all of this directly?

Basj
  • 41,386
  • 99
  • 383
  • 673
  • There are a lot of import/export libraries available online. I found [this Stack question](https://stackoverflow.com/questions/444522/import-and-export-excel-what-is-the-best-library) in particular to be pretty interesting,albeit it's for C#, generally the same principles apply. – Samuel Hulla Jun 21 '18 at 08:41
  • You could take a look at the [`pyexcel`](https://pythonhosted.org/pyexcel/) library. – Martin Evans Jun 22 '18 at 09:38
  • @MartinEvans, this looks interesting. There are many variations: `pyexcel-io`, `pyexcel-xls`, etc. Would you want to post an answer with it? – Basj Jun 22 '18 at 09:55

1 Answers1

1

The pyexcel library is such a high level abstraction for needing to deal with data files in different formats using a consistent interface:

pyexcel provides one application programming interface to read, manipulate and write data in different excel formats. This library makes information processing involving excel files an enjoyable task. The data in excel files can be turned into array or dict with least code, vice versa. This library focuses on data processing using excel files as storage media hence fonts, colors and charts were not and will not be considered.

A simple example usage is given as:

>>> import pyexcel as pe
>>> records = pe.iget_records(file_name="your_file.xls")
>>> for record in records:
...     print("%s is aged at %d" % (record['Name'], record['Age']))
Martin Evans
  • 45,791
  • 17
  • 81
  • 97