1

The code below gets data from a ~500Mb file which has 5000x5000 floats separated by , and newlines.

class Pixel :
    def __init__(self, value, y_val):
        self.val = value
        self.sum = 0
        self.y = y_val
    def __repr__(self):
        return "(%.2f, %.2f)" % (self.val, self.sum)

def Build_Pixel_Array_From_File(filename) :
    with open(filename) as pixels:
        pixelArray = [line.split(", ") for line in pixels]
    for i in range(len(pixelArray)):
        for j in range(len(pixelArray)):
            pixelArray[i][j] = Pixel(float(pixelArray[i][j]), j)
    return pixelArray    

# Main
filename = "input.txt"
pixels = Build_Pixel_Array_From_File(filename)

I originally assumed that I underestimated the amount of memory I would need so I edited my pycharm64.exe.vmoptions:

-Xms1024m
-Xmx2048m
-XX:MaxPermSize=1024m
-XX:ReservedCodeCacheSize=240m
-XX:+UseConcMarkSweepGC
-XX:SoftRefLRUPolicyMSPerMB=50
-ea
-Dsun.io.useCanonCaches=false
-Djava.net.preferIPv4Stack=true
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow

However the program still hangs up giving me only Memory Error after ~15 seconds of run time. I am running the 64-bit system, and the settings do seem to be taking effect. PyCharms built in memory bar shows the changes, but the program is still hanging up while building the array.

EDIT: It was suggested to dynamically cast the strings to floats while filling the array. Doing this caused hang up just before the 10 millionth element (40%)

def Build_Pixel_Array_From_File(filename) :
    pixelArray = []
    parser = csv.reader(open(filename))
    for row in parser:
        pixelSubArray = []
        cell_num = 0
        for cell in row:
            pixelSubArray.append(Pixel(float(cell), cell_num))
            cell_num = cell_num + 1
        pixelArray.append(pixelSubArray)
    return pixelArray
user66050
  • 11
  • 1
  • 3
  • 1
    This might help http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python – Ashish May 17 '16 at 07:24
  • You're essentially still going to hold most of that ~500Mb of text in memory because the first loop will allocate everything that's not a `, ` or `\n` into memory in text form. It may help if you did the traversing of the file and converting to floats in one go. – Kendas May 17 '16 at 07:58
  • Even casting to float on the fly causes the same issues: – user66050 May 17 '16 at 14:12
  • If you are using the 'Run with python console' option in your runtime configurations, I think it takes up a lot of memory. It might work with the box unchecked. – Alex Li Aug 02 '19 at 15:19

0 Answers0