0

I am web scraping a page with a list of e-sport matches : https://www.over.gg/matches

As you can see, the name of the tournament is repeated on each line. In my code, a tournament is its own class, made of several attributes that are created on the fly and a name.

The idea is that, for a given line/match, I want to check if there is already an instance of the tournament class created in the scope (by checking the name attribute), so I do not create a new tournament object for each match. Instead I only "attach" the match to the already existing tournament.

I can manually create a list of each object I create during the execution and check each time if the object already exists, but I wonder if there is a built-in and more smart way of doing that.

It's like checking in a DB if there is already a record with the same primary key to avoid creating a new one. I found some leads but it's not explicitly about my issue.

Doezer
  • 169
  • 1
  • 11

1 Answers1

0

I used the Flyweight Pattern as suggested by avix and as explained on this article by Yannick Loiseau, especially the Gang of Four Version.

For creation of the classes this gives:

class FlyweightFactory(object):
    def __init__(self, cls):
        self._cls = cls
        self._instances = dict()

    def get_instance(self, *args, **kargs):
        return self._instances.setdefault((args, tuple(kargs.items())), self._cls(*args, **kargs))

class Match(object):
    def __init__(self, link, tournament):
    self.tn = tournament
    self.link = link

MatchFactory = FlyweightFactory(Match)

Then I wrote a simple check function to verify at creation if the object already exists:

def get_match_object(link, tournament):
    try:
        match = MatchFactory.get_instance(link, tournament)
    except:
        match = Match(link, tournament)

Also did it for Tournament & Team classes, it works perfectly.

Doezer
  • 169
  • 1
  • 11