2

I'm using python 2, and trying to delete two lists. Here is the code:

test_data1 = [img for img in glob.glob("/location/of/images/*png")]
test_data0 = [img for img in glob.glob("/location/of/other_images/*png")]
test_data = test_data1 + test_data0

Every list of images contains millions of file-names, so I would prefer to delete the unnecessary lists after I created the test_data list. Just for make the code "easier" for the computer to run.

How can I do it?

I found few different ways, but no any of them refereed to memory issues. I'm not sure if test_data1=[] actually delete the list completely from the memory.

also I'm afraid that the test_data = test_data1 + test_data0 line only combine the hashes of the lists, and when I'll delete the two lists, test_data also become empty.

So.. what is the right way?

Really appreciate your help! Sorry if the English is bad, I'm not a native speaker :P

Thanks!

roishik
  • 515
  • 2
  • 9
  • 19
  • 1
    to delete something simply use the `del` keyword – Rptk99 Aug 23 '16 at 15:22
  • look at http://stackoverflow.com/questions/1400608/how-to-empty-a-list-in-python – Rob Aug 23 '16 at 15:25
  • 2
    like `del test_data0`. also if you delete the 2 original lists (e.g `tset_data0` and `test_data1`) the final one (`test_data`) will remain intact because it is a new list – Rptk99 Aug 23 '16 at 15:27

4 Answers4

4

You can use list concatenation to remove the need for the intermediate lists

test_data = []
test_data += [img for img in glob.glob("/location/of/images/*png")]
test_data += [img for img in glob.glob("/location/of/other_images/*png")]

Also I'm not sure what the overall design of your program is, but there is a preference in Python to use iterators/generators instead of lists for just this reason. The less you have to keep in memory at once the better. See if you can redesign your program to just iterate on the fly instead of building up this large list.

Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • In particular, [itertools.chain](https://docs.python.org/3/library/itertools.html#itertools.chain) might prove useful in this specific example. – spectras Aug 23 '16 at 15:18
  • You say "remove the need for the intermediate lists" but you're still doing exactly that, creating intermediate lists. – Stefan Pochmann Aug 23 '16 at 15:34
  • It's not like `[img for img in glob.glob("/location/of/images/*png")]` is not a list though. In terms of memory usage it is not different from original code at all – Dmitry Torba Aug 23 '16 at 15:40
1

You could use extend(). This will instantiate a list and populate it with those items, and extend will append that list to test_data. This way, the only place in memory that the lists exist in will be in test_data. As opposed to multiple instances. Whether that will have any tangible effect on performance can only be determined with testing/profiling.

test_data = []
test_data.extend([img for img in glob.glob("/location/of/images/*png")])
test_data.extend([img for img in glob.glob("/location/of/other_images/*png")])

or using del, to clear the binding for that variable (the garbage collector will delete the unused value).

l = [1,2,3,4,5]
del l  # l cleared from memory. 
ospahiu
  • 3,465
  • 2
  • 13
  • 24
0

The option of adding new data to array as in other answers works, but if you want to keep having two arrays and adding them, consider using garbage collector.

Python has a garbage collector, that will delete the objects when they are no longer in use (i.e. when the object is unreachable and is not referenced any more). For example, if you have the program:

a = [1, 2, 3, 4]
a = []
#  Here data [1, 2, 3, 4] is unreachable (unreferenced)
....

The garbage collector may eventually delete the object [1, 2, 3, 4]. You are not guaranteed when though. It happens automatically and you do not have to do anything with it.

However, if you are concerned about memory resources, you can force garbage collector to delete unreferenced objects using gs.collect() (do not forget to import gc). For example:

import gc

a = [1, 2, 3, 4]
a = []
gc.collect()
#  Here it is guaranteed that the memory previously occupied by [1, 2, 3, 4] is free

So your program will turn into

import gc

test_data1 = [img for img in glob.glob("/location/of/images/*png")]
test_data0 = [img for img in glob.glob("/location/of/other_images/*png")]
test_data = test_data1 + test_data0

test_data1 = []
test_data0 = []

gc.collect()
Dmitry Torba
  • 3,004
  • 1
  • 14
  • 24
-1

In fact, each list store references to string, but not strings itself.

I'm pretty sure, the used memory is about 1M x 4 (for 32 bits architecture) or 1M x 8 (for 64 bits architecture).

I suggest you to do profiling, see Which Python memory profiler is recommended?.

You can use glob.iglob to have iterators instead of lists and chain the list with itertools.chain, as bellow:

import itertools
import glob

iter1 = glob.iglob("/location/of/images/*png")
iter2 = glob.iglob("/location/of/other_images/*png")

test_data = [name for name in itertools.chain(iter1, iter2)]
Community
  • 1
  • 1
Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103