The code below gets data from a ~500Mb file which has 5000x5000 floats separated by ,
and newlines.
class Pixel :
def __init__(self, value, y_val):
self.val = value
self.sum = 0
self.y = y_val
def __repr__(self):
return "(%.2f, %.2f)" % (self.val, self.sum)
def Build_Pixel_Array_From_File(filename) :
with open(filename) as pixels:
pixelArray = [line.split(", ") for line in pixels]
for i in range(len(pixelArray)):
for j in range(len(pixelArray)):
pixelArray[i][j] = Pixel(float(pixelArray[i][j]), j)
return pixelArray
# Main
filename = "input.txt"
pixels = Build_Pixel_Array_From_File(filename)
I originally assumed that I underestimated the amount of memory I would need so I edited my pycharm64.exe.vmoptions:
-Xms1024m
-Xmx2048m
-XX:MaxPermSize=1024m
-XX:ReservedCodeCacheSize=240m
-XX:+UseConcMarkSweepGC
-XX:SoftRefLRUPolicyMSPerMB=50
-ea
-Dsun.io.useCanonCaches=false
-Djava.net.preferIPv4Stack=true
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
However the program still hangs up giving me only Memory Error
after ~15 seconds of run time. I am running the 64-bit system, and the settings do seem to be taking effect. PyCharms built in memory bar shows the changes, but the program is still hanging up while building the array.
EDIT: It was suggested to dynamically cast the strings to floats while filling the array. Doing this caused hang up just before the 10 millionth element (40%)
def Build_Pixel_Array_From_File(filename) :
pixelArray = []
parser = csv.reader(open(filename))
for row in parser:
pixelSubArray = []
cell_num = 0
for cell in row:
pixelSubArray.append(Pixel(float(cell), cell_num))
cell_num = cell_num + 1
pixelArray.append(pixelSubArray)
return pixelArray