0

I have a data engineering program that is grabbing some data off of Federal government websites and transforming that data. I'm a bit confused on whether I need to use the 'self' keyword or if it's a better practice to not use a class at all. This is how it's currently organized:

class GetGovtData():

    def get_data_1(arg1=0, arg2=1):
       df = conduct_some_operations
       return df

    def get_data_2(arg1=4, arg2=5):
       df = conduct_some_operations_two
       return df

I'm mostly using a class here for organization purposes. For instance, there might be a dozen different methods from one class that I need to use. I find it more aesthetically pleasing / easier to type out this:

from data.get_govt_data import GetGovtData

df1 = GetGovtData.get_data_1()
df2 = GetGovtData.get_data_2()

Rather than:

from data import get_govt_data

df1 = get_govt_data.get_data_1()
df2 = get_govt_data.get_data_2()

Which just has a boatload of underscores. So I'm just curious if this would be considered bad code to use a class like this, without bothering with 'self'? Or should I just eliminate the classes and use a bunch of functions in my files instead?

mpSchrader
  • 902
  • 3
  • 20
Ragnar Lothbrok
  • 1,045
  • 2
  • 16
  • 31
  • 2
    Lots of syntax errors here. Are those supposed to be methods? They don't appear to be inside a class. Were they meant to be indented? Also, `Class` is not a Python keyword. Did you mean `class`? If those two functions are in fact meant to be class methods, or static methods, then use the appropriate decorator to flag them as such. A class method wants to know the class, but doesn't require an instance of the class. A static method is just a function defined inside the class. – Tom Karzes Oct 15 '20 at 06:46
  • 1
    you can still achieve the same without using a class and here you shouldn't be using a class. Rename your file to `GetGovtData` and just import it this way `from data import GetGovtData`(but without class) and then the way you call functions will still hold. – sai Oct 15 '20 at 06:46
  • Aren't there issues with naming .py files using camelCase or CapitalCase rather than using lowercase and underscores? – Ragnar Lothbrok Oct 15 '20 at 06:48
  • 1
    No, you absolutely should *not* use classes this way. And note, I don't think anyone has insulted your intelligence. What has been pointed out is that you aren't using classes the way they are meant to be used. And it would be helpful if you used a good tutorial, e.g. the [official tutorial](https://docs.python.org/3/tutorial/classes.html) because StackOverflow cannot substitute for that. If you are using classes like this, you should at least make the methods `staticmethod`s. My advice: take comments at face value, they are usually meant as advice, and not as gratuitous insults. – juanpa.arrivillaga Oct 15 '20 at 07:06
  • 1
    Thanks Ragnar! I am also quite new to SO, but feel the same way as you. – mpSchrader Oct 15 '20 at 07:06
  • 1
    @RagnarLothbrok using `snake_case` for file names is a convention, and honestly, if that is your main concern for not using a module (which would be the intended use-case, to organize your code, especially functions and classes) then simply breaking that convention and using `CapitalCase` would probably be the least bad solution – juanpa.arrivillaga Oct 15 '20 at 07:08
  • 1
    *camel case* is just a convention and not a hard fast rule, so as @juanpa.arrivillaga mentioned, it's not a bad solution, just a trade off. – sai Oct 15 '20 at 07:12

3 Answers3

6

If you develop functions within a Python class you can two ways of defining a function: The one with a self as first parameter and the other one without self.

So, what is the different between the two?

Function with self

The first one is a method, which is able to access content within the created object. This allows you to access the internal state of an individual object, e.g., a counter of some sorts. These are methods you usually use when using object oriented programming. A short intro can be fund here [External Link]. These methods require you to create new instances of the given class.

Function without self

Functions without initialising an instance of the class. This is why you can directly call them on the imported class.

Alternative solution

This is based on the comment of Tom K. Instead of using self, you can also use the decorator @staticmethod to indicate the role of the method within your class. Some more info can be found here [External link].

Final thought

To answer you initial question: You do not need to use self. In your case you do not need self, because you do not share the internal state of an object. Nevertheless, if you are using classes you should think about an object oriented design.

mpSchrader
  • 902
  • 3
  • 20
  • Thanks Tom for the input. I included it in my answer. – mpSchrader Oct 15 '20 at 07:23
  • 2
    Thanks! This is a great answer. So essentially, what I was trying to do was create static methods and I could've just used the @staticmethod decorator? I think I'll probably just change the design to avoid classes in this instance (as I don't think it's necessary here), but this is very useful. – Ragnar Lothbrok Oct 15 '20 at 07:41
  • 2
    @RagnarLothbrok The `@staticmethod` decorator is for methods that do not require a class instance, or even a class. They're just functions that have been placed inside a class. You can invoke them through either the class name or a class instance and the decorator will make sure there is no `self` argument. There are also class methods, indicated by the decorator `@classmethod`. In this case, the first argument is a class rather than a class instance, and is usually called `cls`. This may be the class itself, or it may be any class that uses it as a base class (directly or indirectly). – Tom Karzes Oct 15 '20 at 08:45
2

I suppose you have a file called data/get_govt_data.py that contains your first code block. You can just rename that file to data/GetGovtData.py, remove the class line and not bother with classes at all, if you like. Then you can do

from data import GetGovtData

df1 = GetGovtData.get_data_1()

Depending on your setup you may need to create an empty file data/__init__.py for Python to see data as a module.

EDIT: Regarding the file naming, Python does not impose any too tight restrictions here. Note however that many projects conventionally use camelCase or CapitalCase to distinguish function, class and module names. Using CapitalCase for a module may confuse others for a second to assume it's a class. You may choose not to follow this convention if you do not want to use classes in your project.

TerhorstD
  • 265
  • 1
  • 2
  • 12
  • This is what I wanted to do, but my issue is this: aren't there issues with naming .py files using camelCase or CapitalCase? https://stackoverflow.com/a/42127721/8309944 – Ragnar Lothbrok Oct 15 '20 at 06:55
  • The case of .py files does not cause issues. It's just convention. See edit. – TerhorstD Oct 16 '20 at 07:22
1

To answer the question in the title first: The exact string 'self' is a convention (that I can see no valid reason to ignore BTW), but the first argument in a class method is always going to be a reference to the class instance.

Whether you should use a class or flat functions depends on if the functions have shared state. From your scenario it sounds like they may have a common base URL, authentication data, database names, etc. Maybe you even need to establish a connection first? All those would be best held in the class and then used in the functions.

Jann Poppinga
  • 444
  • 4
  • 18