11

I'm trying to subclass io.TextIOWrapper following this post, although my aims are different. Starting off with this (NB: motivation):

class MyTextIOFile(io.TextIOWrapper):
    def read(self, *args):
        cont = super().read(*args)
        return cont.replace("\x00", "")

I'm trying to open a file using my constructor using

In [81]: f = MyTextIOFile("file.csv")

but this gives:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-90-343e18b2e32f> in <module>()
----> 1 f = MyTextIOFile("file.csv")

AttributeError: 'str' object has no attribute 'readable'

And indeed, it appears io.TextIOWrappers constructor expects to be passed a file object. Through trial and error, I discovered this file object needs to be opened in binary mode. But I can't find documentation anywhere, and I don't feel like building on top of undocumented behaviour (indeed, one attempt to go ahead with it already lead me to problems when trying to pass my object to csv.reader). What is the correct and supported way to subclass a file object in Python 3?

I'm using Python 3.5.0.

Community
  • 1
  • 1
gerrit
  • 24,025
  • 17
  • 97
  • 170
  • 2
    Consider using composition instead; have your class use `open` to open the file and save a reference to the object returned instead. – chepner Oct 15 '15 at 18:54
  • @chepner I'm not sure exactly what you mean — do you mean not inheriting from the `io.IOBase` family? Ultimately I want to pass this to `csv.csvreader` and therefore my aim is to read the file stripped of all NULs (see [this question](http://stackoverflow.com/a/4169762/974555)). – gerrit Oct 15 '15 at 18:58
  • Right; `csv.csvreader` doesn't care what type it receives, as long as it implements the iterator protocol (i.e., has a `next` method that can be called to get the next row). – chepner Oct 15 '15 at 19:00
  • @chepner Thank you for teaching me the term composition, it is new for me and could be considered, although I don't know how many methods I'd need to implement to make it work with `csv.reader`. – gerrit Oct 15 '15 at 19:03
  • Just `next`. From the [docs](https://docs.python.org/2/library/csv.html#module-contents): "csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called" – chepner Oct 15 '15 at 19:04
  • @chepner Thanks for pointing that out. For my specific purpose here composition may then be considerably easier. Still curious to see the answer to the inheritance-approach question, though. – gerrit Oct 15 '15 at 19:07

2 Answers2

2

I think the documentation you are looking for is

class io.TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False)
    A buffered text stream over a BufferedIOBase binary stream. [...]

The first argument is a binary stream, which implies something opened in binary mode by open.

chepner
  • 497,756
  • 71
  • 530
  • 681
0

As far as "fixing" your csv file, you could also use a generator:

# untested
def FixCsv(csv_file, *args, **kwds):
    "assumes text-mode file; removes NUL-bytes"
    if isinstance(csv_file, str):
        file_obj = open(csv_file, *args, **kwds)
    else:
        file_obj = csv_file
    for line in file_obj:
        yield line.replace('\x00','')
    file_obj.close()

But your problem is probably caused by a utf-16 encoded file.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • It's not utf-16 encoded. I don't know the encoding. It's a CSV generated by Surveymonkey. But that's a different question (I realise my question is somewhat of an XY problem). – gerrit Oct 16 '15 at 09:12