0

I'm managing several user accounts on a website with an API and I'm regularly retrieving some information for every user.

To regularly get those information I'm using a python script which loads user data from a database and then uses the API connector to make the request.

The endpoints I'm using to do this are private endpoints and, to authenticate, I need to make a request on a specific endpoint with user's api_key and api_secret as parameters, the response contains an access_token which is then used to authenticate the user on private endpoints.

This token is given using request's headers and it must be refreshed regularly.

The connector is working well, however I recently tried to use this in a multi-threaded context. So instead of looping users, I'm launching a thread for every user and I join them after.

In a multi-threaded context the connector also works, but on some rare occasions I realized that user data were mixed up.

I went further into debugging and I realized that in those cases, the issue was that the connector was using the access_token of another user.


I reproduced this issue with a simple example to expose the logic of the script.

#!/usr/bin/python3

from utils.database import Database
from urllib.parse import urlencode
import threading
import requests
import time

class User():
    def __init__(self, user_id, database):
        self.user_id = user_id
        self.database = database
        self.connector = None

    def get_connector(self:
        if not self.connector:
            self.__create_connector()
        return self.connector

    def __create_connector(self):
        __user_api_df = self.database.get_table("users_apis", where=f"WHERE user_id = {self.user_id}")
        api_key = __user_api_df["api_key"].values[0]
        api_secret = __user_api_df["api_secret"].values[0]
        self.connector = ApiConnector(api_key, api_secret, self)

    def __str__(self):
        return f"User-{self.user_id}"

class ApiConnector():

    def __init__(self, api_key, api_secret, user=None):
        self.base_url = "https://www.api.website.com"
        self.api_key = api_key
        self.api_secret = api_secret
        self.user = user
        self.session = requests.Session()
        self.__auth_token = None
        self.__auth_timeout = None

    def api_call_1(self):
        return self.__request("GET", "endpoint_path_1", auth=True)

    def api_call_2(self):
        return self.__request("GET", "endpoint_path_2", auth=True)

    def api_call_3(self):
        return self.__request("GET", "endpoint_path_3", auth=True)

    def __request(self, method, path, payload={}, auth=False, headers={}):
        
        url = f"{self.base_url}{path}"
        
        headers["Accept"] = "application/json"
        
        if auth:
            if not self.__is_authenticated():
                self.__authenticate()
            headers["Authorization"] = "Bearer " + self.__auth_token
        
            print(f"[{self.user}] IN => {path} - {self.__auth_token}")
        
        if method == "GET":
            payload_str = f"?{urlencode(payload)}" if payload else ""
            response = self.session.request(method, f"{url}{payload_str}", headers=headers)
        else:
            response = self.session.request(method, url, params=payload, headers=headers)

        if auth:
            print(f"[{self.user}] OUT => {path} - {response.request.headers['Authorization']}")
        
        return response.json()

    def __authenticate(self):
        response = self.__request("GET", "authentication_endpoint", payload={
            "api_key": self.api_key,
            "api_secret": self.api_secret
        })
        self.__auth_token = response["result"]["access_token"]
        self.__auth_timeout = time.time() + response["result"]["expires_in"]

    def __is_authenticated(self):
        if not self.__auth_timeout:
            return False
        if self.__auth_timeout < time.time():
            return False
        return True

class RequestsTester:
    
    def __init__(self):
        self.database = Database("host",
                                 "user",
                                 "password",
                                 "database")
        
        self.user_ids = [1, 2, 3]
        
        self.threads = {}
        
    def run(self):
        
        for user_id in self.user_ids:

            user = User(user_id, self.database)
            
            thread_name = f"Thread-{user_id}"
            self.threads[thread_name] = threading.Thread(target=self.get_data, args=[user])
            self.threads[thread_name].start()

        for thread_name in self.threads.keys():
            self.threads[thread_name].join()

    def get_data(self, user):
        user.get_connector().api_call_1()
        user.get_connector().api_call_2()
        user.get_connector().api_call_3()
    

if __name__ == "__main__":
    RequestsTester().run()

Note 1 : I didn't include the Database class since it's not relevant for the context but every class method is mutex protected to avoid concurrent access.

Note 2 : I'm using python 3.9.2 and request 2.25.1


Before making the call I print the access_token and after the call I print the access_token from the response's request headers

The output generally looks like this:

[User-1] IN => /private/endpoint_path_1 - 1673482029231.1EPZ7Ya-
[User-3] IN => /private/endpoint_path_1 - 1673482029265.1Cdx06z2
[User-2] IN => /private/endpoint_path_1 - 1673482029284.1JrX_wyQ
[User-3] OUT => /private/endpoint_path_1 - Bearer 1673482029265.1Cdx06z2
[User-1] OUT => /private/endpoint_path_1 - Bearer 1673482029231.1EPZ7Ya-
[User-2] OUT => /private/endpoint_path_1 - Bearer 1673482029284.1JrX_wyQ

But on some rare occasion it looks like this

[User-1] IN => /private/endpoint_path_1 - 1673482029231.1EPZ7Ya-
[User-3] IN => /private/endpoint_path_1 - 1673482029265.1Cdx06z2
[User-2] IN => /private/endpoint_path_1 - 1673482029284.1JrX_wyQ
[User-3] OUT => /private/endpoint_path_1 - Bearer 1673482029231.1EPZ7Ya-
[User-1] OUT => /private/endpoint_path_1 - Bearer 1673482029231.1EPZ7Ya-
[User-2] OUT => /private/endpoint_path_1 - Bearer 1673482029284.1JrX_wyQ

The output access token is not the same than the input one and it's the token of another user that is used.


This minimal example is just to understand how the script works but in real condition I have way more than 3 users and I'm not just making API calls but also processing data and storing some things into database from get_data function.

Every time this error case happens, the input token is always the good one but the output token is always a token from another user, so the issue seems to come from requests lib.

If I use a loop instead of launching threads, the error never occurs, so it seems to come from the multi-threading context.

From what I saw requests lib and Session class are supposed to be thread-safe so I don't understand where this error can come from.

I'm not experimented with python multi-threading so I may be doing something wrong but I can't find what.

Does anybody already had such an issue with requests lib miwing headers in a multi-threaded context ?

Arkaik
  • 852
  • 2
  • 19
  • 39
  • I'd say 75% chance the `{}` default value here `def __request(self, method, path, payload={}, auth=False, headers={}):` is your problem https://stackoverflow.com/questions/26320899/why-is-the-empty-dictionary-a-dangerous-default-value-in-python – Macattack Jan 12 '23 at 00:28
  • @Macattack I checked your link and it's a very strange behavior I was not aware of. By using `None` instead of `{}` the issue doesn't seems to come back so I guess it was the empty dict which mixed up headers. Thanks a lots. However, to try to understand what was really happening I tried to reproduce it without using `requests` or any class shown above and I couldn't generate a fail case with an empty dict since it's always the same field being overwritten. So I still don't understand how it's failing. – Arkaik Jan 13 '23 at 17:08

1 Answers1

1

The problem is unrelated to threads, here's a demo with no threads:

class Myclass:
    def fn(self, key, value, problem={}):
        print('Before', problem)
        problem[key] = value
        print('After', problem)

a = Myclass()
a.fn(1,2)
b = Myclass()
b.fn(2,3)
c = Myclass()
c.fn(3,4)

The output of this is:

Before {}
After {1: 2}
Before {1: 2}
After {1: 2, 2: 3}
Before {1: 2, 2: 3}
After {1: 2, 2: 3, 3: 4}

As you can see, even though we create three instances of Myclass we continue to get the exact same dict and it continues to have the modifications we made to it.

You should not use a dict as a default value, and instead do something like:

def fn(self, key, value, problem=None):
    if problem is None:
        problem = {}
Macattack
  • 1,917
  • 10
  • 15