1

I need to generate around a few hundreds of UUIDs and then I need to reuse each UUID a few thousands times.

What will give me better performance?

Option 1: Generate the uuid every time from the input? Option 2: Use Python's lru_cache(maxsize=None) around the method generating the uuid? Option 3: Store the uuid in a dictionary and retrieve it (primitive cache)?

RaamEE
  • 3,017
  • 4
  • 33
  • 53

2 Answers2

1

Running 1,000,000 iterations for each option, resulted in these times:

WITHOUT CACHE: 1.53233003616333

WITH CACHE: 0.06612777709960938

WITH DICT: 0.051396846771240234

So, the fastest option is using a dict, but using lru_cache will be milliseconds behind it.

import time
import uuid
from functools import lru_cache


# Running 1,000,000 iterations resulted in these times
# WITHOUT CACHE: 1.53233003616333
# WITH CACHE: 0.06612777709960938
# WITH DICT: 0.051396846771240234

@lru_cache(maxsize=None)
def generate_uuid_v5(name):
    return str(uuid.uuid5(uuid.NAMESPACE_DNS, name))


# ----------------------------------------
start = time.time()

# Calculate a UUID version 5 (without caching)
for i in range(1, 1_000_000):
    uuid_without_cache = uuid.uuid5(uuid.NAMESPACE_DNS, "example.com")

end = time.time()
delta = end - start
print(f"WITHOUT CACHE: {delta}")

# ----------------------------------------

start = time.time()

# Calculate a UUID version 5 (with caching)
for i in range(1, 1_000_000):
    uuid_without_cache = generate_uuid_v5("example.com")

end = time.time()
delta = end - start
print(f"WITH CACHE: {delta}")

# ----------------------------------------

start = time.time()

uuids_dict: dict = {"Thread-1": generate_uuid_v5("example.com")}
# Calculate a UUID version 5 (with caching)
for i in range(1, 1_000_000):
    uuid_without_cache = uuids_dict["Thread-1"]

end = time.time()
delta = end - start
print(f"WITH DICT: {delta}")

and this surprisingly doesn't do worse. In this example I ran 1000 times for each of 1000 uuids

Running 1,000,000 iterations resulted in these times

WITHOUT CACHE: 1.550447940826416

WITH CACHE: 0.06554079055786133

WITH DICT: 0.051934003829956055

import time
import uuid
from functools import lru_cache


# Running 1,000,000 iterations resulted in these times
# WITHOUT CACHE: 1.550447940826416
# WITH CACHE: 0.06554079055786133
# WITH DICT: 0.051934003829956055

@lru_cache(maxsize=None)
def generate_uuid_v5(name):
    return str(uuid.uuid5(uuid.NAMESPACE_DNS, name))


# ----------------------------------------
start = time.time()

# Calculate a UUID version 5 (without caching)
for name in range(1, 1_000):
    _name: str = str(name)
    for i in range(1, 1_000):
        uuid_without_cache = uuid.uuid5(uuid.NAMESPACE_DNS, _name)

end = time.time()
delta = end - start
print(f"WITHOUT CACHE: {delta}")

# ----------------------------------------

start = time.time()

# Calculate a UUID version 5 (with caching)
for name in range(1, 1_000):
    _name: str = str(name)
    for i in range(1, 1_000):
        uuid_without_cache = generate_uuid_v5(_name)

end = time.time()
delta = end - start
print(f"WITH CACHE: {delta}")

# ----------------------------------------

start = time.time()

uuids_dict: dict = {"Thread-1": generate_uuid_v5("example.com")}
# Calculate a UUID version 5 (with caching)
for name in range(1, 1_000):
    _name: str = str(name)
    uuids_dict: dict = {_name: generate_uuid_v5(_name)}
    for i in range(1, 1_000):
        uuid_without_cache = uuids_dict[_name]

end = time.time()
delta = end - start
print(f"WITH DICT: {delta}")
RaamEE
  • 3,017
  • 4
  • 33
  • 53
1

using LRU cache or retrieve from a dict

Source code of functools, including lru_cache is available for inspection at github, proper implementation is _lru_cache_wrapper, in all cases cache is created following way

cache = {}

and relevant wrapper for maxsize=None is

def wrapper(*args, **kwds):
    # Simple caching without ordering or size limit
    nonlocal hits, misses
    key = make_key(args, kwds, typed)
    result = cache_get(key, sentinel)
    if result is not sentinel:
        hits += 1
        return result
    misses += 1
    result = user_function(*args, **kwds)
    cache[key] = result
    return result

So using LRU cache means retrieving from dict plus some accounting for hits and misses, therefore you might expect more time than just using retrieve from dict, though accounting should not be resource-hungry.

Daweo
  • 31,313
  • 3
  • 12
  • 25