0

In a python script I load a dataset into a environment variable which uses up almost all system memory.

Lets say I did something with that data and I am ready to dismiss it and load another dataset of similar size.

Now because of Pythons internal garbage collector and references system to a variable, it is not trivial to remove this variable from the environment and thus release the memory to be able to load a new variable.

What is the best way of doing this?

hirschme
  • 774
  • 2
  • 11
  • 40
  • 3
    Use `del variable` – AmourK Dec 20 '18 at 21:16
  • 1
    Passing large amounts of data via the environment is a bad idea. Use a pipe or a socket to transfer the data to a child process instead. – chepner Dec 20 '18 at 21:17
  • " it is not trivial to remove this variable from the environment and thus release the memory to be able to load a new variable." It's not? Usually, I never even think about explicitly freeing objects, and it is taken for me. Are you simply dumping a bunch of objects into the global scope, each of which has references to this large data object? – juanpa.arrivillaga Dec 20 '18 at 21:17
  • Just reassign the variable to the second dataset and let the garbage collector take care of everything. If that does not fit the RAM, just delete the variable reference first. – Eb946207 Dec 20 '18 at 21:19
  • @juanpa.arrivillaga He's putting (or at least claiming to put) the data in an environment, which isn't subject to Python's garbage collection. – chepner Dec 20 '18 at 21:19
  • 1
    @chepner I'm not sure how to interpret what the OP is saying by "I load a dataset into a environment variable". Like, are they doing something like `data= python my_script.py`? – juanpa.arrivillaga Dec 20 '18 at 21:20
  • 2
    Are you sure you mean "an _environment_ variable"? Just asking because without any code to see, I'm not convinced you're actually talking about environment variables here. I'm not sure exactly how environment variable cleanup works but I'd agree that it is a bad idea to put more than perhaps some KB into environment variable – erik258 Dec 20 '18 at 21:21
  • @puhs that is not a duplicate of that (I think mine is) because the reference must be deleted **before** it is collected, which is usually automatically. – Eb946207 Dec 20 '18 at 21:21
  • @juanpa.arrivillaga Possibly, or `os.environ['data'] = foo()`. It's an important point to clear up before we can offer any real solution. – chepner Dec 20 '18 at 21:22
  • @chepner IIRC mutating `os.environ` doesn't actually do anything, the dict returned by that is created once and doesn't do anything if you mutate it, but totally agreed. – juanpa.arrivillaga Dec 20 '18 at 21:23
  • @hirschme Garbage collection doesn't apply to the environment, if that's where you are truly putting the data. You're going to have to show some code that demonstrates what you are doing. – chepner Dec 20 '18 at 21:23
  • @DanFarrell your are probably right. When I am saying 'environment' I am just refering to the fact that the variable is part of my environment variables. And that I wish to remove only that variable, but keep the other variables in the environment. I am not referring to special subenvironments – hirschme Dec 20 '18 at 21:24
  • @juanpa.arrivillaga It depends on how `putenv` is implemented on your system, but that's just an example, though. Perhaps he's using the `env` argument to `subprocess.Popen`, but the point is, we have to establish whether or not the environment is really being used. – chepner Dec 20 '18 at 21:25
  • 1
    @hirschme dude, *what do you mean by environment*? Do you mean like an OS system environment variable? Probably not, right? Anyway, you need to tell us exactly what the problem is. There is no way to say "delete this object from memory" in Python. I work with giant data-sets that go in and out of memory all the time, and I've never had to worry about that. What exactly are you doing that makes it so your object isn't reclaimed when it isn't being used any more? – juanpa.arrivillaga Dec 20 '18 at 21:26
  • 1
    @hirschme As far as I know, you can't release memory used for the environment, as it was allocated by your *parent* and populated before your being to run. You can change the value of a large variable to be an empty string, and *maybe* the now-unused memory is available for use by a new variable, but it's best to just not use the environment for large amounts of data in the first place. – chepner Dec 20 '18 at 21:27
  • @juanpa.arrivillaga gonna take a guess; an IDE supporting IPython. – roganjosh Dec 20 '18 at 21:36
  • @roganjosh yeah. Or maybe they mean something like the Spyder variable explorer. I don't know. Especially since they say "without closing the session". I am pretty sure they don't actually mean environment variables, simply because it doesn't make a lot of sense. – juanpa.arrivillaga Dec 20 '18 at 21:38

1 Answers1

0

You can remove an environment variable foo using either os.unsetenv('foo') (if your operating system supports it) or del os.environ['foo'], but the memory formerly used by foo can only be used for other environment variables. The environment is a separate area of memory from the Python heap, and is not subject to garbage collection.

os.environ is just a mapping of the environment (which is an array of null-terminated strings of the form <name>=<value>, where <name> is a string containing any character except = or the null character, and <value> can be any character except the null character). Using del ensures that both the environment and os.environ are updated; using unsetenv directly only affects the environment. As such, del os.environ['foo'] is preferred.

You should find a different way to pass data to your child process, and keep your environment small.

chepner
  • 497,756
  • 71
  • 530
  • 681