-1

As an exercise in learning OOP, I am trying to convert an existing script into OOP form, without success: my current OOP method creates an object that is not iterable <main.rawText object at 0x0000029D55515BA8> TypeError: 'rawText' object is not iterable

The aim of my exercise is to read the content from a CSV file (a collection of product reviews), into a list of lists that will be cleaned up and analysed. How can I produce a list on which I can do list and text operations?

The first script below is my failed attempt, with the working non-OOP version after that

class rawText(object):
        def __init__(self, name_file):
                self.name_file = name_file

        def read_file(self):
                """Read the file concent"""

                with open(name_file, 'r') as in_file:
                    self = in_file.readlines()
                return self

        def display_file(self):
                print(self)

def main():
        x = rawText('HCPsentiment2.csv')
        x.display_file()

if __name__ == '__main__':
        main()

The above produces something that I cannot run content_cleaner on. Below is my original...

# Step 1A - define the content cleaner
def content_cleaner(feed_list):
    temp_list = [str(item) for item in feed_list]
    temp_list = [item.lower() for item in temp_list]
    temp_list = [item.replace("b\'","").replace("\\x93","").replace("\\x94","").replace("\\x96","")
            .replace('.','').replace(',','').replace(';','').replace(':','').replace('(','').replace(')','')                .replace("'\'","").replace("\\x92","'").replace('"','').replace('"','').replace('[','').replace(']','')
            .replace("\\","'")
             for item in temp_list]
    return list(filter(None, temp_list))

# Step 1B - draw in raw sample text (here a pre-screened csv file)
with open('HCPsentiment2.csv', 'rb') as file:
    content = file.readlines()
    # perform transformation
    content_clean = content_cleaner(content)

# Step 1C - split and clean the sample
content_cl_sp=[phrase.split() for phrase in content_clean]
content_flat = [item for sublist in content_cl_sp for item in sublist]
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
Ian Jones
  • 61
  • 5
  • `self = in_file.readlines()` ? what do you think this does? self is the _instance_ of you class .. why do you try to assign something to it? – Patrick Artner Jan 26 '19 at 21:38
  • `open('HCPsentiment2.csv', 'rb')` ... why do you read _text_ as `"rb"` binary? – Patrick Artner Jan 26 '19 at 21:40
  • why all the `item.replace("b\'","").replace("\\x93","").replace("\\x94","").replace("\\x96","") .replace('.','').replace(',','').replace(';','').replace(':','').replace('(','').replace(')','') .replace("'\'","").replace("\\x92","'").replace('"','').replace('"','').replace('[','').replace(']','').replace("\\","'")` ? – Patrick Artner Jan 26 '19 at 21:41
  • ... would using module csv or pandas or whatever to read your csv respecting quoted strings not make it far easier? CSV reading is simple, look f.e. here: [how-to-read-csv-file-lines-and-split-elements-in-line-into-list-of-lists](https://stackoverflow.com/questions/45120726/how-to-read-csv-file-lines-and-split-elements-in-line-into-list-of-lists) – Patrick Artner Jan 26 '19 at 21:41
  • Thanks for your reply, Patrick. As a general answer to all your questions: I am relatively new to programming, python and OOP. Specifically... 1. I thought self = in_file.readlines() would create a reader object (I'm not clear from your response whether this assumption was correct or not). 2. 'rb' was a typo 3. The replace statements cut a bunch of junk out of the text 4. Later elements (not shown) cover some of your pandas-related comment – Ian Jones Jan 26 '19 at 22:11

1 Answers1

1

You need to specify special methods (__next__ and __iter__) to make a class itself iterable.

Usign

self = in_file.readlines()

does not work - it replaces the whatever self points to (before it was the instance of your class - afterwards it is a list of lines) - that does not change your other variables that hold the instance of your class.


If your csv is small and you can hold all data inside the class itself you can read in the file and store it inside the class:

class rawText(object):
    def __init__(self, name_file):
        self.name_file = name_file
        self.lines = None
        self.idx = 0

    def read_file(self):
        """Read the file concent and store inside class instance"""
        with open(self.name_file, 'r') as in_file:
            self.lines = [x.rstrip() for x in in_file.readlines()]
        return self.lines

    def __next__(self):
        if not self.lines:
            self.read_file()
        try: 
            self.idx += 1
            return self.lines[self.idx - 1].rstrip() 
        except IndexError:
            raise StopIteration

    def __iter__(self): 
        return self

    # replaces your display_file
    def __str__(self):
        return self.name_file + (" : " if self.lines else "") + (
                                 "    ".join(self.lines or []))

Usage:

rt = rawText(fn)
print(rt)

for line in rt:
    print ("iterated got: " , line)

print(rt)

Output:

t.txt                                # str before reading data
iterated got:  a,b,c                 # iterating over stuff
iterated got:  1,2,3
iterated got:  4,5,6
t.txt : a,b,c    1,2,3    4,5,6      # str after reading data

If your data file is bigger, you might not want to store all the lines inside your class and you should modify it to yield from the file object somehow.

For more info see How to implement __iter__(self) for a container object (Python)

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69