2

I have 1000+ .txt files with stock dates and prices that I've cast to a dictionary (with filename(stock ticker) as the key, and the data for each file as a data frame). I calculated the moving average with .rolling, then found the percent difference between the moving average and the price. So, the percent difference is its own column for each DataFrame. The Code for all this looks like this:

filelist = os.listdir(r'Insert File Path')
filepath = r'Insert File Path'


dic1 = {}

for file in filelist:
    df = pd.read_csv(filepath + file,sep='\t')
dic1[file]= df

for value in dic1.values():
    value.rename(columns={value.columns[0]:'Dates',value.columns[1]:'Prices'},inplace=True)

for value in dic1.values():
    value['ma'] = value['Prices'].rolling(window=50).mean()

for value in dic1.values():
    value['diff'] = value['Prices'] - value['ma']

for value in dic1.values():
     value['pctdiff']= value['diff']/value['Prices']

My question is how do I find the top 5 greatest (and smallest, because they can be negative) of the pctdiff columns?

I've tried:

for df in dic1.values():
    for num in df['pctdiff'].max():
        print(num.max())

but I get the following error: "'float' object is not iterable"

dergky
  • 105
  • 1
  • 9
  • Could this be of any help? https://stackoverflow.com/questions/6910641/how-do-i-get-indices-of-n-maximum-values-in-a-numpy-array – warnbergg Feb 17 '20 at 05:58
  • @itslwg I don't think so because that is talking about an array, where this is a column within a data frame. but thank you. – dergky Feb 17 '20 at 06:09
  • Or this https://stackoverflow.com/questions/34133494/sorting-a-pandas-series-in-absolute-value. The `pctdiff` is a pandas Series right? Then i think you can use this – warnbergg Feb 17 '20 at 06:11

2 Answers2

2

Is this what you mean?

list_result = []
for key,value in dic1.items():
    value.rename(columns={value.columns[0]:'Dates',value.columns[1]:'Prices'},inplace=True)
    value['ma'] = value['Prices'].rolling(window=50).mean()
    value['diff'] = value['Prices'] - value['ma']
    value['pctdiff']= value['diff']/value['Prices']
    list_result.append([key,value['pctdiff'].max()])

list_result.sort(key = lambda x : x[1] )
highest_list = list_result[-5:]
smallest_list = list_result[:5]
LaChatonnn
  • 159
  • 7
  • This seemed to work, but how can I figure out where the values are in the dictionary (or what stock they correspond to)? Thanks – dergky Feb 17 '20 at 17:16
  • @dergky I edit the post regarding to your requirement. However, beware that if the value are the same (5th = 6th), you will miss the 6th value. To improve, you need to create a function to check it. – LaChatonnn Feb 17 '20 at 18:03
  • Thank you so much! super helpful! Just a note, in line 7, I changed df['pctdiff'] to value['pctdiff'] and that seems to work. It didn't work with df. – dergky Feb 17 '20 at 18:17
  • @dergky Yes, you are correct. Sorry for my mistake. I just always call my vars with df – LaChatonnn Feb 18 '20 at 00:17
0

Just to make the code a bit more clean, and run all you variable adding in one for loop insted of four

filelist = os.listdir(r'Insert File Path')
filepath = r'Insert File Path'

dic1 = {}

for file in filelist:
    df = pd.read_csv(filepath + file,sep='\t')
dic1[file]= df

for value in dic1.values():
    value.rename(columns={value.columns[0]:'Dates',value.columns[1]:'Prices'},inplace=True)
    value['ma'] = value['Prices'].rolling(window=50).mean()
    value['diff'] = value['Prices'] - value['ma']
    value['pctdiff']= value['diff']/value['Prices']

and then use the answer by @Edchum here to sort the pctdiff by absolute value (convert it to a pandas Series if the object is something else). Something like (if you want to store it sorted)

...

for value in dic1.values():
    ...
    pctdiff = value['diff']/value['Prices']
    pctdiff = pctdiff.reindex(pctdiff.abs().sort_values().index)
    value['pctdiff']= pctdiff
warnbergg
  • 552
  • 4
  • 14
  • thanks for your help cleaning up the code, but the solution didn't seem to work, it only seemed to work for one value in the dictionary, not all the values (all the stocks). – dergky Feb 17 '20 at 17:24