0

I have a folder with a number of CSV files. Each file follows the same format with the header:

date,total_cost,total_pnl_pre,total_pnl_pos,total_pnl_per_pre,total_pnl_per_pos

A typical CSV file will look like:

date,total_cost,total_pnl_pre,total_pnl_pos,total_pnl_per_pre,total_pnl_per_pos
2015-07-27,-0.0,0.0,0.0,0.0,0.0
2015-07-28,-0.0,0.0,0.0,0.0,0.0
2015-07-29,-0.6738699251792465,0.0,-0.6738699251792465,-0.0,-0.027000000000000003
2015-07-30,-0.0,-123.88294424426506,-123.88294424426506,-4.961880089696313,-4.961880089696313
2015-07-31,-0.0,1.9275568497366795,1.9275568497366795,0.09627642044988116,0.09627642044988116

However there are some files where I have NaN values (see below)

date,total_cost,total_pnl_pre,total_pnl_pos,total_pnl_per_pre,total_pnl_per_pos
2015-07-27,-0.0,0.0,0.0,0.0,0.0
2015-07-28,-0.0,0.0,0.0,0.0,0.0
2015-07-29,NaN,NaN,NaN,0.0,0.0
2015-07-30,NaN,NaN,NaN,0.0,0.0
2015-07-31,NaN,NaN,NaN,0.0,0.0

I have two scripts hit_rate and max_drawdown that I use to process these files is:

def hit_rate(array_like):
    seq=np.array(array_like)
    seq=seq[np.nonzero(seq)]
    total_num=len(seq)
    if total_num==0: return -float('Inf')
    pos_num=len(seq[seq>0.0])
    neg_sum=total_num-pos_num
    if neg_sum==0: return float('inf')
    return pos_num/neg_sum

def max_drawdown(ser):
    running_max=pd.expanding_max(ser)
    cur_dd=ser-running_max
    return min(0,cur_dd.min())

The CSV file is read into the script in the variable array_like and ser The scripts falls over when it encounters a NaN value. Is there a way to either set the NaN values to zero or ignore the NaN values when processing the CSV file?

halfer
  • 19,824
  • 17
  • 99
  • 186
Stacey
  • 4,825
  • 17
  • 58
  • 99

0 Answers0