0

I am a brand new to coding (a couple weeks) and im writing some code to webscrape some HTML. The majority of my questions are more structure based about classes and some of the explanations I don't really understand, so if possible try and break it down for me. I am using beautifulsoup to webscrape

I have 2 questions that build on one another:

  1. I am still unsure of when to use self and how to use it, when the point of this class is to really just organize functions, and also that I'd like to practice using classes in general. The confusing part is that it runs without the self argument, but I remember from many other sources I've read that it requires that argument. I suspect it could be a static function but when I try to set that up I don't really understand what I am doing and I get errors. If someone could help me fill in the blanks of my knowledge that would be great.

  2. From what I have read about classes I believe I can use __init__ to define a few variables or processes that all the functions in the class can reference. In this instance I am trying to pass page and soup to the three functions inside class UND as these never change in the program and I hate to repeat the lines of code for every function. The issue is if I try to set up this __init__ to pass the variables, it does not work because I haven't used self. I'm just super confused.

The first box of code is what I am using right now and it works! But I feel I need to address the issues outlined above.

The second box of code is my attempt to use __init__, but it has errors that it is missing 'self'

Thank you to all who may (or may not!) help!

class UND:
    def getFlightCategory():    # Takes the appropriate html text and sets it to a variable
        page = requests.get("http://sof.aero.und.edu")
        soup = BeautifulSoup(page.content, "html.parser")
        flightCategoryClass = soup.find(class_="auto-style1b")
        return flightCategoryClass.get_text()

    def getRestrictions():  # Takes the appropriate html text and sets it to a variable
        page = requests.get("http://sof.aero.und.edu")
        soup = BeautifulSoup(page.content, "html.parser")
        flightRestrictionsClass = soup.find(class_="auto-style4")
        return flightRestrictionsClass.get_text()

    def getInfo():
        return UND.getFlightCategory(), UND.getRestrictions()

class UND:
    def __init__(self):
        page = requests.get("http://sof.aero.und.edu")
        soup = BeautifulSoup(page.content, "html.parser")

    def getFlightCategory(self):    # Takes the appropriate html text and sets it to a variable
        flightCategoryClass = soup.find(class_="auto-style1b")
        return flightCategoryClass.get_text()

    def getRestrictions(self):  # Takes the appropriate html text and sets it to a variable
        flightRestrictionsClass = soup.find(class_="auto-style4")
        return flightRestrictionsClass.get_text()

    def getInfo():
        return UND.getFlightCategory(), UND.getRestrictions()

frogs114
  • 13
  • 3
  • 2
    Not exactly a duplicate, but I strongly suspect reading [What `__init__` and `self` do in Python?](https://stackoverflow.com/q/625083/364696) will answer many of your questions. – ShadowRanger Jan 06 '21 at 01:40
  • I have 2 questions"—then [please ask them separately](https://meta.stackexchange.com/q/39223/248627). And please make sure that each is on-topic as defined in the [help/on-topic]. – ChrisGPT was on strike Jan 06 '21 at 01:56

3 Answers3

1

It's a tricky thing about Python and very commonly confused.

The word self in Python is essentially an instance of the class [instance is basically an "object" of the class].

It is not a keyword and hence can be replaced by any other word. It is a convention to use self.

Why does it work without self? It's because you must be invoking the function without creating an object of the class.

When would it fail?

object_und=Und()
object_und.getRestrictions()

This invocation requires that the function getRestrictions have class object as its first parameter/argument i.e.

def getRestrictions(self):
Chaitanya Bapat
  • 3,381
  • 6
  • 34
  • 59
0

Your second attempt is close to what you need, but you need to attach the values you want to preserve in __init__ to self as attributes (the instance of the class constructed for you when you make a UND object, which __init__ is responsible for initializing), and you need to read those attributes back out from self later. Here is a fixed version of your code with comments on all additions and modifications and why:

class UND:
    def __init__(self):
        # Doesn't need to be an attribute, since you never use page outside __init__
        page = requests.get("http://sof.aero.und.edu")

        # Needs to be an attribute since you use soup in other methods
        self.soup = BeautifulSoup(page.content, "html.parser")
    #   ^^^^^ New, you need to attach page/soup as attributes of self; otherwise they're regular locals that go away when __init__ returns

    def getFlightCategory(self):    # Takes the appropriate html text and sets it to a variable
        flightCategoryClass = self.soup.find(class_="auto-style1b")
                            # ^^^^^ New, soup is found attached to the self you received
        return flightCategoryClass.get_text()

    def getRestrictions(self):  # Takes the appropriate html text and sets it to a variable
        flightRestrictionsClass = self.soup.find(class_="auto-style4")
                                # ^^^^^ New, is found attached to the self you received
        return flightRestrictionsClass.get_text()

    def getInfo(self):
        #       ^^^^ New, you must receive self as this is an instance method
        return self.getFlightCategory(), self.getRestrictions()
        #      ^^^^                      ^^^^ Changed, call the methods on the self 
        # you received so they receive the existing instance themselves

You'd use this class by creating an instance:

und = UND()

then calling whatever methods you liked on that instance, e.g. print(und.getFlightCategory()) to get the flight category data and print it.

In general, you want to use self for all instance methods in Python. On rare occasion, a method of a class may not rely on an actual instance (it's logically a top-level function, but its usefulness is so tightly bound to the class that it's convenient to have it attached there). In those cases you omit the self argument, and decorate the method with @staticmethod, e.g.:

 class Clock:
     ...  # Rest of class

     @staticmethod
     def bong():
         print("Bong!")

In this case, all clocks go bong the same way; there's no need to refer to an instance to figure out what to do. The method only makes sense in the context of Clock, but it doesn't need a clock, so you make it a static method, which means that you can call Clock.bong() or clock_instance.bong() (e.g. self.bong() in an instance method of Clock) and it will work either way. @classmethod is another exception; I won't go into details here, but it's basically how you make a subclass friendly alternate constructor; you're welcome to research that later, it's not necessary this early in your Python experience.

If you need further information on the purpose of __init__ and self, I strongly recommend reading more on this more general question What __init__ and self do in Python?.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Okay, I think I get most of what you're getting at. The use of self directly references the class, correct? So when we want to call getInfo(), we have to call each function and reference what class it is in for it to execute? i.e you could call it by saying UND().getInfo # which would call UND().getFlightrestrictions etc. Is this correct? – frogs114 Jan 06 '21 at 02:55
  • @frogs114: `self` is the instance the method was called on. When you do `und = UND()`, you make an instance of the `UND` class. When you do `und.getInfo()`, it calls the `getInfo` method in the class (look up on instances checks the class as well), and implicitly binds `self` to the `und` you called it with; `self` can do everything `und` could do (because it's just an alias to the same object bound to `und`), including calling other methods. – ShadowRanger Jan 06 '21 at 03:34
  • And yes, you don't need to name an instance, you could do `UND().getInfo()` and it would construct an instance, call the method on it (which receives `self`) and when the method completes, the instance disappears. If you want to call multiple methods though, you probably don't want to reconstruct it every time, since that means downloading and parsing the page it's based on multiple times. – ShadowRanger Jan 06 '21 at 03:35
-1

Using Self:

When you are going to instantiate a class, i.e. declare an object of a class, you'll have to use self. For example you have two objects of a class. Then the self will refer to the object you are calling from. In your case, the class is UND so to create an object,

object1 = new UND()

Not using self:

When you are not going to create objects for a class, you shouldn't use self.

For init() method: init() is just a constructor method. It is used to initialize values. but it does not have global variables.

try like this:

class UND:

page=''
soup=''

def __init__():
global page,soup
page=...
soup=....
  • init cannot define variables, it can only set values to them. So use global variables. – Sankara Subramanian Jan 06 '21 at 01:35
  • Everything about this is wrong. `__init__` *must* take a `self` argument (not necessarily by that name, but it's always going to get one and won't work without it). `new` is not a keyword for creating a class instance in Python. Omitting `self` because "you don't plan to create instances" is a good clue you don't need a class at all; on the rare occasions it make sense, you should be using the `@staticmethod` decorator on the method so it works even if you call it on an instance of the class. – ShadowRanger Jan 06 '21 at 01:43