0

My python scripts parses files sequentially, and makes simple data cleaning and writes to a new csv file. I'm using csv. the script is taking awfully long time to run.

cProfile output is as follows:enter image description here

I have done a lot of googling before posting the question here.

link to the image image link

Adding code here, the function which is called

def db_insert(coCode, bse):
start = time()
q = []
print os.path.join(FILE_PATH, str(bse)+"_clean.csv");
f1 = open(os.path.join(FILE_PATH, str(bse)+"_clean.csv"))
reader = csv.reader(f1)
reader.next()
end = time()
# print end-start
for idx,row in enumerate(reader):
    ohlc = {}
    date = datetime.strptime( row[0], '%Y-%m-%d')
    date = row[0]
    row  = row[1:6]
    (op, high, low, close, volume) = row
    ohlc[date] = {}
    ohlc[date]['open'] = op
    ohlc[date]['high'] = high
    ohlc[date]['low'] = low
    ohlc[date]['close'] = close
    ohlc[date]['volume'] = volume
    q.append(ohlc)
end1 = time()
# print end1-end

db.quotes.insert({'bse':str(bse), 'quotes':q})
# print time()-end1
f1.close()
q = []
print os.path.join(FILE_PATH, str(coCode)+".csv");
f2 = open(os.path.join(FILE_PATH, str(bse)+"_clean.csv"))
reader = csv.reader(f2)
reader.next()
for idx,row in enumerate(reader):
    ohlc = {}
    date = datetime.strptime( row[0], '%Y-%m-%d')
    date = row[0]
    try:
        extra = row[7]+row[8]+row[9]
    except:
        try:
            extra = row[7]
        except:
            extra = ''
    row  = row[1:6]
    (op, high, low, close, volume) = row
    ohlc[date] = {}
    ohlc[date]['open'] = op
    ohlc[date]['high'] = high
    ohlc[date]['low'] = low
    ohlc[date]['close'] = close
    ohlc[date]['volume'] = volume
    ohlc[date]['extra'] = extra
    q.append(ohlc)
db.quotes_unadjusted.insert({'bse':str(bse), 'quotes':q})
f2.close()
Krishna Deepak
  • 1,735
  • 2
  • 20
  • 31

1 Answers1

8

I found this explanation in an answer by John Machin.

ncalls is relevant only to the extent that comparing the numbers against other counts such as number of chars/fields/lines in a file may highligh anomalies; tottime and cumtime is what really matters. cumtime is the time spent in the function/method including the time spent in the functions/methods that it calls; tottime is the time spent in the function/method excluding the time spent in the functions/methods that it calls.

Community
  • 1
  • 1
Praxeolitic
  • 22,455
  • 16
  • 75
  • 126