0
import shapefile
data = shapefile.Reader("data_file.shp")
shapes = data.shapes()

My problem is that getting the shapes from the Shapefile reader gives me an exception MemoryError when using Pyshp.

The .shp file is quite large, at 1.2 gB. But I am using ony 3% of my machine's 32gB, so I don't understand it.

Is there any other approach that I can take? Can process the file in chunks in Python? Or use some tool to spilt the file into chinks, then process each of them individually?

Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551

2 Answers2

3

Quoting from this answer by thomas:

The MemoryError exception that you are seeing is the direct result of running out of available RAM. This could be caused by either the 2GB per program limit imposed by Windows (32bit programs), or lack of available RAM on your computer. (This link is to a previous question). You should be able to extend the 2GB by using 64bit copy of Python, provided you are using a 64bit copy of windows.

So try a 64bit copy of Python or provide more detail about your platform and Python versions.

Community
  • 1
  • 1
nick_gabpe
  • 5,113
  • 5
  • 29
  • 38
1

Although I haven't been able to test it, Pyshp should be able to read it regardless of the file size or memory limits. Creating the Reader instance doesn't load the entire file, only the header information.

It seems the problem here is that you used the shapes() method, which reads all shape information into memory at once. This usually isn't a problem, but it is with files this big. As a general rule you should instead use the iterShapes() method which reads each shape one by one.

import shapefile
data = shapefile.Reader("data_file.shp")
for shape in data.iterShapes():
    # do something...
Karim Bahgat
  • 2,781
  • 3
  • 21
  • 27
  • I Can conform that the problem was not with the creation of the object, but with the `shapes()` method. However, the problem was that I was using 32 bit Python, which can only address 32gB RAM. When I installed the 64 bot version, the problem went away. However, if the `iterShapes()` method only loads a single shape into memory at a time then, of course, I will use that – Mawg says reinstate Monica Sep 23 '16 at 13:08