1

I would like to write a scraper that has 3 different "groups" of attributes (or data) that would [and likely should] be kept separately.

I was hoping to use DataClasses and aim at Pythonic practices, but DataClasses don't feel appropriate for reasons stated in more detail later.

The 3 groups [or "interfaces"] are as follows:

#1: HTTP Header fields

  • has defaults, but needs to be mutable at/after object instantiation of the (#3) request class object
  • ideally acts like a dict when using a request method inside #3 request object

#2: API parameters for the URL query request

  • has defaults, but also needs to be mutable at/after instantiation
  • ideally acts like a dict when using a request method inside #3 request object

#3:Response Object (the data) after the request is returned to the user from the API server.

  • I would later implement methods for the object to have output formats such as CSV, JSON, SQL DB, S3, etc. That would be [at least] a 4th interface.

The Task I've been trying to Accomplish

I want an interface where a user can instantiate a class, e.g. Player with the API params they need and are also update HTTP header (as needed).

Here's my current code (pic form): Player Class Request Object

The HTTP Header and the API Params are both easily stored as Python dicts (or JSON). I have included them below.

=> The question is how do I make them mutable in the Request object (Class) at instantiation (creation) and able to be updated after instantiation (creation)?

  • Inheritance via DataClassses? I have tried to put these dictionaries in DataClasses but they don't like them link as it's a hack to try to get around the default_factory using field from the dataclass module. It's possible, but defeats using Dataclasses to avoid all the extra syntax. Using Dataclasses also makes it so the MyDataClass.__dict__ has way more stuff to it than PythonClass.__dict__. => Thus use a regular Python Class or Dict...

  • Using a Regular Python Class: There seems to be two options to allow mutability of the HTTP Header at creation. 1) Inheritance, but that muddies the waters of the attributes of the HTTP Header with the API Params. 2) Composition, setting an attribute field to the HTTPClassHeader and doing some work to be able to convert back to a dict to use in the request_data() method.

  • Putting the Dicts into the Players (Request Class) doesn't allow mutability via a nice keyword interface (or I'm not aware how to implement it).

Here's my code in text form:

class Players:

    __endpoint__ = "CommonallPlayers"

    def __init__(self, IsOnlyCurrentSeason=0, LeagueID="00", Season="2021-22", header= HTTPHeader) -> None:
        # these first 3 attributes constitute the (#2) API Params
        self.IsOnlyCurrentSeason = IsOnlyCurrentSeason
        self.LeagueID = LeagueID
        self.Season = Season
        
        self.header = HTTPHeader # (1) inherit as a Class or Dict?



    def encode_api_params(self):
        return self.__dict__ # if only 3 attributes, this works, but not if I add more attributes HTTP or self.request_data

    def get_http_header(self):
        # ideally can return the http_header as a dict
        pass

    # ideally this is NOT instantiated (as doesn't have data, shouldn't be accessible to user until AFTER request)
    def request_data(self):
        url_api = f"{BASE_URL}/{self.__endpoint__}"
        return requests.get(url_api, 
                            params=self.encode_api_params(), 
                            headers=self.get_http_header())

# works, has current defaults (current season)
c = Players()

# a common use case, using a different Season than the default (current season)
c = Players(Season="1999-00")

# A possible needed change, with 2 possible desired interface
c = Players(Season="1999-00", header={"Referer": "https://www.another-website.com/"})
c = Players(Season="1999-00").header(Referer="https://www.another-website.com/")

# Final outputs
c.request_data().to_csv("downloads/my_data.csv")
c.request_data().to_sql("table-name")


Here's the HTTP HEADER, the API Params, and Request Object in the simplest form are as follows (running these together would return some data):

HTTP_HEADER = {
    "Accept": "application/json, text/plain, */*",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "en-US,en;q=0.9",
    "Connection": "keep-alive",
    "Host": "stats.nba.com",
    "Origin": "https://www.nba.com",
    "Referer": "https://www.nba.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "same-site",
    "Sec-GPC": "1",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36",
    "x-nba-stats-origin": "stats",
    "x-nba-stats-token": "true",
}

params = {'IsOnlyCurrentSeason': 0, 'LeagueID': '00', 'Season': '2021-22'}


r = requests.get("https://stats.nba.com/stats/commonallplayers", # base url
                 params=params, # expects (#2) params, the api parameters to be a dict
                 headers=headers) # expects (#1) headers to be a dict

r.json()
Paul
  • 545
  • 8
  • 14
  • 1
    Is there a question embedded in this somewhere? I see plenty of aspirational statements but no specific question. Open-ended and opinion based questions that boil down to subjective responses are generally not a good fit for this site, since there generally is not a single correct answer but a range of opinions based on different approaches – itprorh66 May 23 '22 at 18:23
  • I appreciate the reply. I don't know the Pythonic way to keep 3 sets of attributes separate to avoid overlap of HTTP HEADER fields with API PARAM Fields with DATA Fields (from the API request) in the Players class. One can use Dataclasses to avoid extra boilerplate. In the same spirit, encapsulating HTTP_HEADER and API_PARAMS before the Player class would help avoid having to type out a ton of self.attribute_1... self.attribute_20 for "encode http header" and "encode api params" methods. So is that the right way? Or can you use Inheritance/Composition to tidy them into those methods? – Paul May 23 '22 at 18:54
  • I bolded the question within the body of the post. Then amended the Title/Question. How do I embed distinct groups of data within a class that are also mutable objects within that class? – Paul May 23 '22 at 19:10
  • where is `to_csv` etc. coming from? i dont think `requests.get` response defines those for example. – rv.kvetch May 23 '22 at 19:34
  • 1
    @rv.kvetch good question. Sorry. You're absolutely right. Requests does have a `.json` method but no to_csv. That's the extra functionality I would want to include on the object. It's not necessarily needed for this question -- as it would be considered another "interface" for Python Design Patterns where likely using the `csv` package in another class, then inheriting or using another method to inject the functionality into that object. (I'm not sure best way to do it) – Paul May 23 '22 at 21:12

0 Answers0