You really have two questions buried in here.
Your Technical Issue
The problem you're facing will most likely be resolved if you upgrade to a newer version of Python, or you should at least get a better traceback. The mmap docs specify that you need to open a file for update to mmap it, and you're not currently doing that.
ifile = open(ifilename) # default is to open as read
Should be this:
ifile = open(ifilename, 'r+')
Or, if you can update to Python 2.6 as you mentioned in your comments,
with open(ifilename, 'r+') as fi:
# do stuff with open file
If you don't open a file with write permissions on 2.7 and try to mmap it, a "Permission denied" exception is raised. I suspect that error was not implemented in 2.3, so now you're being allowed to continue with an invalid mmap object that fails when you try to search it with the regex.
mmap vs. open().read()
In the end, you will be able to do (almost) the same thing with both methods. re.search(pattern, mmap_or_long_string)
will search either your memory mapped file or the long string that results from the read()
call.
The main difference between the two methods is in Virtual vs Real Memory consumption.
In a memory-mapped file, the file remains on disk (or wherever it is) and you directly access it through virtual memory addresses. When you read a file in using read()
, you are bringing the whole file into (real) memory all at once.
Why One or the Other:
File Size
The most significant limit on the size of the file you can map is the size of your virtual memory address space, which is dictated by your CPU (32 or 64 bit). The memory allocated must be contiguous though, so you may have allocation errors if the OS can't find a large enough block to allocate the memory. When using read()
, on the other hand, your limit is physical memory available instead. If you are accessing files larger than available memory and reading individual lines isn't an option, consider mmap.
File Sharing Among Processes
If you are parallelizing read-only operations on a large file, you can map it into memory to share it among processes instead of each process reading in a copy of the whole file.
Readability/Familiarity
Many more people are familiar with the simple open()
and read()
functions than memory mapping. Unless you have a compelling reason to use mmap, sticking with the basic IO functions is probably better in the long run for maintainability.
Speed
This one is a wash. A lot of forums and posts like to talk about mmap speed (because it bypasses some system calls once the file is mapped), but the underlying mechanism is still accessing a disk, while reading a whole file in brings everything into memory and only performs disk accesses at the beginning and end of working with the file. There is endless complexity if you try to account for caching (both hard disk and CPU), memory paging, and file access patterns. It is much easier to stick with the tried and true method of profiling. You will see different results based on your individual use case and access patterns for your files, so profile both and see which one is faster for you.
Other Resources
A good summary of the differences
PyMOTW
A good SO question
Wikipedia Virtual Memory article