0

I've written a short Python script to read in a 12GB file:

start = time.time()

my_file = open('my_12GB_file.txt', 'rb')

my_file_lines = set(my_file.readlines())

end = time.time()

print "Time elapsed: %r" % (end - start)

my_file.close()

The script reads in the file, prints time elapsed, and then stalls (as if it has entered an infinite loop). Any ideas about what could be going wrong?

Update:

Program terminated after I changed:

my_file_lines = set(my_file.readlines())

to

my_file_lines = my_file.readlines()
Brinley
  • 591
  • 2
  • 14
  • 26
  • 1
    what do you mean _stalls_? Does it not terminate? – Ma0 Aug 10 '17 at 15:30
  • doesn't terminate – Brinley Aug 10 '17 at 15:30
  • Do you have 32 bit python? – nick_gabpe Aug 10 '17 at 15:31
  • what happens if you move the `my_file.close()` before the `end = time.time()`? Does the elapsed time get printed? – Ma0 Aug 10 '17 at 15:32
  • I think that you should be patient. Even if at the end you will probably get a [`MemoryError`](https://docs.python.org/2/library/exceptions.html?highlight=memoryerror#exceptions.MemoryError). See, e.g., [this](https://stackoverflow.com/questions/5537618/memory-errors-and-list-limits) to overcome this situation. – keepAlive Aug 10 '17 at 15:32
  • @nick_gabpe 64 bit – Brinley Aug 10 '17 at 15:33
  • Have you tried adding `sys.exit(0)` at the end? – Adam W. Cooper Aug 10 '17 at 15:35
  • The issue is probably that your process is unable to access enough real memory to read the whole file in without paging (sending some content to backing store to cope with the insufficiency of real memory). This would make the read take far longer than you might expect. Leave it overnight (but you may still see a `MemoryError` exception). – holdenweb Aug 10 '17 at 15:35

2 Answers2

1

When reading files it is very recommended to use python built-in with especially with large files:

with open("my_12GB_file.txt") as large_file:
    for line in large_file:
        do_something(line)

with responsible to close the file when done or in case of failure. If you will read the file line by line it also won't load the whole file to the memory, which maybe the problem you have.

Emanuel
  • 640
  • 1
  • 7
  • 25
0

It seems enough memory is not available and so the program takes long time, there are two solution one you can break up your file into small files or second way you can try with more than 12 GB ram. According to me the first option is more feasible