This is my second day working in Python .I worked on this in C++ for a while, but decided to try Python. My program works as expected. However, when I process one file at a time without the glob loop, it takes about a half hour per file. When I include the glob, the loop takes about 12 hours to process 8 files.
My question is this, is there anything in my program that is definitely slowing it down? is there anything I should be doing to make it faster?
I have a folder of large files. For example
file1.txt (6gb) file2.txt (5.5gb) file3.txt (6gb)
If it helps, each line of data begins with a character that tells me how the rest of the characters are formatted, which is why I have all of the if elif statements. A line of data would look like this: T35201 M352 RZNGA AC
I am trying to read each file, do some parsing using splits, and then save the file.
The computer has 32gb of ram, so my method is to read each file into ram, and then loop through the file, and then save, clearing ram for the next file.
I've included the file so you can see the methods that I am using. I use an if elif statement that uses about 10 different elif commands. I have tried a dictionary, but I couldn't figure that out to save my life.
Any answers would be helpful.
import csv
import glob
for filename in glob.glob("/media/3tb/5may/*.txt"):
f = open(filename,'r')
c = csv.writer(open(filename + '.csv','wb'))
second=0
mill=0
for line in f.readlines():
#print line
event=0
ticker=0
marketCategory=0
variable = line[0:1]
if variable is 'T':
second = line[1:6]
mill=0
else:
second = second
if variable is 'R':
ticker = line[1:7]
marketCategory = line[7:8]
elif variable is ...
elif variable is ...
elif ...
elif ...
elif ...
elif ...
elif
if variable (!= 'T') and (!= 'M')
c.writerow([second,mill,event ....])
f.close()
UPDATE Each of the elif statements are nearly identical. The only parts that change are the ways that I split the lines. Here are two elif statements (There are 13 total, and they are almost all identical except for the way that they are split.)
elif variable is 'C':
order = line[1:10]
Shares = line[10:16]
match = line[16:25]
printable = line[25:26]
price = line[26:36]
elif variable is 'P':
ticker = line[17:23]
order = line[1:10]
buy = line[10:11]
shares = line[11:17]
price = line[23:33]
match = line[33:42]
UPDATE2
I have ran the code using for file in f
two different times. The first time I ran a single file without for filename in glob.glob("/media/3tb/file.txt"):
and it took about 30 minutes manually coding the file path for one file.
I ran it again with for filename in glob.glob("/media/3tb/*file.txt")
and it took an hour just for one file in the folder. Does the glob code add that much time?