0

Problem: I am currently working on clustering of data and found a weird behavior in my Jupyter Notebooks. All seeds are fixed. Executing some or all code multiple times generates stable results. Restarting the Kernel causes the results to change. Anyway those new results are stable as long as the kernel is active.

import numpy as np
import random, os

# Fix Seeds
def fix_seeds(seed=1234):
    random.seed(seed)
    np.random.seed(seed)

# In case the modules do something on import
fix_seeds()

# Other imports (Only depending on random and/or numpy)

So my Question: How or at which point is that randomness introduced and how to fix that?

MyNameIsFu
  • 90
  • 5
  • I doubt that this is related to copying the file. If you terminate the kernel and restart it, does it still produce the same results? – Thomas May 13 '22 at 09:41
  • Yes, I tried that a few times. It seems that a renaming a file (adding 'copy' tag) is the relevant difference. – MyNameIsFu May 13 '22 at 09:44
  • Okay I just tested renaming the file... different clusters as well – MyNameIsFu May 13 '22 at 09:44
  • What version of Python are you using? The ordering of dicts is only guaranteed since version 3.7. On earlier versions, the order [might be different per session](https://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions). But if a kernel restart doesn't affect the results, or you're using Python 3.7+, then it must be something else. – Thomas May 13 '22 at 09:46
  • I'm using Python 3.8.8 and ipykernel 5.3.4 – MyNameIsFu May 13 '22 at 09:47
  • Hmm, then I don't know. You'd need to create a minimal, self-contained example that reproduces the problem. Don't forget to list your exact package versions (e.g. `pip freeze` output). – Thomas May 13 '22 at 09:48
  • Please provide enough code so others can better understand or reproduce the problem. – Community May 13 '22 at 10:45

1 Answers1

1

Apparently Python >= 3.3 uses a random Hash-Seed to avoid collision attacks. Fixing that seed (eg. executing PYTHONHASHSEED=0 python3 <file>.py) solves my problem. Same goes for the Kernels. On startup they generate a Hash-Seed as well.

Source: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED

MyNameIsFu
  • 90
  • 5