I have a number of very large text files which I need to process, the largest being about 60GB.
Each line has 54 characters in seven fields and I want to remove the last three characters from each of the first three fields - which should reduce the file size by about 20%.
I am brand new to Python and have a code which will do what I want to do at about 3.4 GB per hour, but to be a worthwhile exercise I really need to be getting at least 10 GB/hr - is there any way to speed this up? This code doesn't come close to challenging my processor, so I am making an uneducated guess that it is limited by the read and write speed to the internal hard drive?
def ProcessLargeTextFile():
r = open("filepath", "r")
w = open("filepath", "w")
l = r.readline()
while l:
x = l.split(' ')[0]
y = l.split(' ')[1]
z = l.split(' ')[2]
w.write(l.replace(x,x[:-3]).replace(y,y[:-3]).replace(z,z[:-3]))
l = r.readline()
r.close()
w.close()
Any help would be really appreciated. I am using the IDLE Python GUI on Windows 7 and have 16GB of memory - perhaps a different OS would be more efficient?.
Edit: Here is an extract of the file to be processed.
70700.642014 31207.277115 -0.054123 -1585 255 255 255
70512.301468 31227.990799 -0.255600 -1655 155 158 158
70515.727097 31223.828659 -0.066727 -1734 191 187 180
70566.756699 31217.065598 -0.205673 -1727 254 255 255
70566.695938 31218.030807 -0.047928 -1689 249 251 249
70536.117874 31227.837662 -0.033096 -1548 251 252 252
70536.773270 31212.970322 -0.115891 -1434 155 158 163
70533.530777 31215.270828 -0.154770 -1550 148 152 156
70533.555923 31215.341599 -0.138809 -1480 150 154 158