13

I'm trying to write a custom extraction method for babel, to extract strings from a specific column in a csv file. I followed the documentation here.

Here is my extraction method code:

def extract_csv(fileobj, keywords, comment_tags, options):
    import csv
    reader = csv.DictReader(fileobj, delimiter=',')
    for row in reader:
        if row and row['caption'] != '':
            yield (reader.line_num, '', row['caption'], '')

When i try to run the extraction i get this error:

File "/Users/tiagosilva/repos/naltio/csv_extractor.py", line 18, in extract_csv for row in reader: File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py", line 111, in next self.fieldnames File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py", line 98, in fieldnames self._fieldnames = next(self.reader) _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

It seems the fileobj that is passed to the function was opened in binary mode.

How to make this work? I can think of 2 possible solutions, but I don't know how to code them:

1) is there a way to use it with DictReader?

2) Is there a way to signal babel to open the file in text mode?

I'm open to other non listed solutions.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
tiagosilva
  • 1,695
  • 17
  • 31

1 Answers1

31

I actually found a way to do it!

It's solution 1, a way to handle a binary file. The solution is to wrap a TextIOWrapper around the binary file and decode it and pass that to the DictReader.

import csv
import io

with io.TextIOWrapper(fileobj, encoding='utf-8') as text_file:
    reader = csv.DictReader(text_file, delimiter=',')

    for row in reader:
        if row and 'caption' in row.keys():
            yield (reader.line_num, '', row['caption'], '')
tiagosilva
  • 1,695
  • 17
  • 31
  • 5
    In case it helps anyone else: this approach also works great if you have a zip file containing one or more csv files and are using python 3.6+ zipfile (and possibly older) that only supports opening in binary mode – Foon Jul 24 '20 at 13:16
  • 4
    This compact solution solved the problem I'm facing, wherein an unknown file blob has already been opened as binary but needs to be handled as text if it's actually a CSV (and I can't change how it is originally ingested). Every other answer I've seen changes how you open it, rather than how you process it. – MartyMacGyver Oct 16 '20 at 19:03
  • Thanks for this. This is a really neat solution to the problem. So far, all the other solutions I've seen ask me to load the entire content of the file in memory before passing it to the CSV reader. – Redowan Delowar Jun 25 '22 at 21:46