1

Background

=============

Let's say I'm writing some unit tests, and I want to test the re-opening of a log file (it was corrupted for some reason outside or inside my program). I currently have a TextIOWrapper from originally running open(), which I want to fully delete or "clean up". Once it's cleaned up, I want to re-run open(), and I want the ID of that new TextIOWrapper to be something new.

Problem

=============

It seems to re-appear with the same ID. How do I fully clean this thing up? Is it a lost cause for some reason hidden in the docs?

Debug

=============

My actual code has more try/except blocks for various edge cases, but here's the gist:

import gc  # I don't want to do this

# create log
log = open("log", "w")
id(log)  # result = 01111311110

# close log and delete everything I can think to delete
log.close()
log.__del__()
del log
gc.collect()

# TODO clean up some special way?

# re-open the log
log = open("log", "a")
id(log)  # result = 01111311110

Why is that resulting ID still the same?

Theory 1: Due to the way the IO stream works, the TextIOWrapper will end up in the same place in memory for a given file, and my method of testing this function needs re-work.

Theory 2: Somehow I am not properly cleaning this up.

szofar
  • 11
  • 5
  • Interesting. For me the id remains the same even when it's a completely different file the second time. Maybe it's just recycling the id number and it's not the same object at all. – Alexander May 04 '22 at 00:09

1 Answers1

0

I think you do enough clean up by simply calling log.close(). My hypothesis (now proven see below) is based on the fact that my example below delivers the result you were expecting in the code in your question.

It seems that python reuses the id numbers for some reason.

Try this example:

log = open("log", "w")
print(id(log))  # result = 01111311110

# close log and delete everything I can think to delete
log.close()
log = open("log", "a")
print(id(log))
log.close()

[edit] I found proof of my hypothesis:

The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.

In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused.

The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.

Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.

more info on Python's reuse of id values at How unique is Python's id()?

Alexander
  • 16,091
  • 5
  • 13
  • 29
  • okay, looks like I will have to find a different way to test this. I'm probably over-thinking the depth of the test. Maybe I could check the tests for the _io package to see what was done here... Actually your link there contained another cool link [How unique is UUID?](https://stackoverflow.com/questions/1155008/how-unique-is-uuid) .. It's always amazing when you pull back the curtain on something as innocuous as this and see just how much of your world runs on assumptions built on "pillars of sand". – szofar May 04 '22 at 16:16
  • @szofar Agreed :) – Alexander May 04 '22 at 18:59