1

I'm new to Python, working on my first project; challenging myself to incorporate OOP into it. I've pre-written it procedurally and it works fine, however looking for best practices to re-work it into OOP.

My question for you: Is it better to call Beautiful soup within a class, so that each object is a self-contained entity, as I've done below in the discography function? (I've got multiple classes, each would call Beautiful Soup accordingly) Or, would it make more sense to call it globally; in which case how would I pass all the necessary variables into each of the classes?

Below; an example class I'm working on. Input a band name, and returns a dict of all their albums with URLs to the album's wikipedia page.

import urllib.request
from bs4 import BeautifulSoup


class Band:

    def __init__(self, name):
        self.name = name

    def discography(self):
        url = 'https://en.wikipedia.org/wiki/Special:Search?search=%s_discography' % self.name
        page = urllib.request.urlopen(url)
        soup = BeautifulSoup(page, 'html.parser')
        table = soup.find('table', attrs={'class': 'wikitable plainrowheaders'})
        titles = table.findAll(attrs={'scope': 'row'})


        def getAlbumTitles(url):
            albumTitles = [title.text for title in titles]
            return albumTitles

        def getDiscogLinks(url):
            album_href = [row.findAll('a') for row in titles]
            clean_links = []

            for i in album_href:
                for href in i:
                    if href.parent.name == 'i':
                        clean_links.append('https://en.wikipedia.org' + href.get('href'))

            return clean_links

        return dict(zip(getAlbumTitles(url), getDiscogLinks(url)))

name = input('Enter a band name: ')
artist = Band(name)
print(artist.discography())

#beatles = Band('beatles')
#print(beatles.discography())
Will
  • 169
  • 1
  • 3
  • 12
  • Somewhat unrelated: You *really* should not build URLs like this. Query strings need to be properly encoded, use the right tool. [How to urlencode a querystring in Python?](https://stackoverflow.com/questions/5607551/how-to-urlencode-a-querystring-in-python) - look at the top 2 answers. – Tomalak Nov 28 '17 at 21:35
  • Thank you... I appreciate the feedback. But, what is the benefit of that as opposed to just piecing together a string? Is it any faster? Or just a better practice for building a more scaleable code? – Will Nov 28 '17 at 23:11
  • The benefit is that piecing together a string is wrong and using urlencode is not. URLs have an encoding format you need to adhere to. As soon as your data contains special characters, the "piecing together" method will fall apart, making your program fail for real-life input while it works for simple test cases. It also makes your code susceptible to injection attacks. – Tomalak Nov 29 '17 at 04:19
  • 1
    This applies to each and every situation where you combine your program code and user input - whether you build HTML, SQL, shell commands, you name it. "Just piecing together strings" is to be considered dangerous and wrong and you need to train your mind to red-flag it. Most of the attacks against database- and web servers out there are the result of some developer thinking *"Oh, let's just piece together those two strings here, what could go wrong?"*. It might not seem as a big deal in your program here, but it sits there and is sloppy at least and a lingering bug at worst. – Tomalak Nov 29 '17 at 04:30
  • Wow... thank you. I had no idea that it was that serious of an issue. Alright. Going to urlencode it then. Thank you for all of your advice! – Will Nov 29 '17 at 23:45

0 Answers0