I'm looking to understanding conceptually how several Jupyter notebooks running on Spark kernels (such as SparkMagic) can share a cluster of worker nodes.
If User A persists or caches a large RDD (whether on disk or on memory) in a cell, and then goes away for the weekend but does not stop his/her notebook, will this degrade other users' ability to run their jobs while User A's notebook is running?
That is, all the Spark notebooks sharing the cluster will be able to submit jobs at the same time (do not have to run sequentially), but the resources will be divided up, right?
This is a general question, but for us we're running on an AWS Sagemaker and EMR environment in an US region, in case it makes a difference.