How to use csv.DictReader on a tarfile object in Python 3.6?

Question

Here's the issue I'm running into:

Error: iterator should return strings, not bytes (did you open the file in text mode?)

The code that's causing this looks something like:

for fileinfo in tarfile.open(filename):
    f = t.extractfile(fileinfo)
    reader = csv.DictReader(f)
    reader.fieldnames

The trouble seems to be that the extractfile() method produces a io.BufferedReader that is a very basic file-like object and has no high-level text interface.

What would be a good way to handle this?

I'm thinking of looking at decoding the bytes from the reader into text but I need to retain streaming because these files are very large. The codebase is Python 3.6 running on Docker/Linux.

I'm too lazy to tar a csv file and post a complete and tested solution, but you should take a look at [`io.TextIOWrapper`](https://docs.python.org/3/library/io.html#io.TextIOWrapper). — Aran-Fey, Oct 02 '18 at 21:08
Can't you just wrap it as a text stream using the [`codecs`](https://docs.python.org/3/library/codecs.html) module? Something like `codecs.getreader("utf-8")(t.extractfile(fileinfo))`? — zwer, Oct 02 '18 at 21:12

score 0 · Answer 1 · answered Oct 02 '18 at 21:34

Thanks to both @Aran-Fey and @zwer who led me to another StackOverflow question that answered it. Here's how:

for fileinfo in tarfile.open(filename):
    with t.extractfile(fileinfo) as f:
        ft = codecs.getreader("utf-8")(f)
        reader = csv.DictReader(ft)
        reader.fieldnames

This seems to work so far.

How to use csv.DictReader on a tarfile object in Python 3.6?

1 Answers1