Save vars pr iteration to df and when done save df to csv

Question

I need to make a DataFrame (df_max_res) with the 15 best performances from my stock strategies combined with company tickers (AAPL for Apple Computers etc.). I have a list of more than 500 stock tickers that I fetch and on which I analyze using four of my own strategies.

In the for eachP in perf_array nested inner iteration I obtain performance results from all combinations of the strategy and ticker. I want to save these results to a DataFrame and to a csv file using this code (or a better suggestion):

#==============================================================================
#     Saving results in pandas and to a csv-file
#==============================================================================
def saving_res_pandas():
    global df_res, df_max_res
    df_res = pd.DataFrame(columns=('Strategy', 'Ticker', 'Strat', 
                                   'ROI', 'Sharpe R', 'VaR'))
    for eachP in perf_array:
        df_res.loc[len(df_res) + 1] = [strategy, ticker, strat, stratROI]  
    # Select the top 15 of all results (ticker/strategy combo) into new df.
    df_max_res = df_res[:15]           
    # Saving to a csv.
    df_max_res.to_csv('df_performance_data_sp500ish.csv')
    print('After analysing %1.1f Years ~ %d workdays - %d strategies and %d tickers' '\n'
          'The following matrix of tickers and strategies show highest ROI: ' 
          % (years, days, len(strategies), len(stock_list))
         )

    return df_res
#==============================================================================
# Chose which of below methods to save perf-data to disk with
#==============================================================================
saving_res_pandas()

# Reading in df_max_res with best ticker/strategy results
df_max_res = pd.read_csv('df_performance_data_sp500ish.csv')
print(df_max_res)

The code above creates my DataFrame just fine, but it does not save the iteration performance result as I expect.

I am getting this output:

=======================================================
 aa   ===   <function strategy1 at 0x00000000159A0BF8>   ==
=======================================================


Holdings: 0
Funds: 14659

Starting Valuation:  USD 15000.00 ~ DKK: 100000.50
Current Valuation:   USD 14659.05 ~ DKK: 97727.49

===  aa  == <function strategy1 at 0x00000000159A0BF8> ==
ROI: -1.9 perc. & Annual Profit -1894 DKK  ==
######################################################################

cannot set a row with mismatched columns

== ALL Tickers Done for ==  <function strategy1 at 0x00000000159A0BF8> ==================
Strategy analysis pr ticker - COMPLETE !

Empty DataFrame
Columns: [Unnamed: 0, Strategy, Ticker, ROI, SharpeR, VaR]
Index: []

Providing a good example will make it much more likely to get a useful answer: http://stackoverflow.com/help/mcve — Mike Müller, May 23 '15 at 09:10
How many columns do you have? this line `df_res[:15]` is selecting columns up to 15, is that what you want as your comment seems to suggest you want `df_res.head(15)` — EdChum, May 23 '15 at 09:23
Exactly I want to show all columns but only for the top 15 combinations. So I guess I must write df_res.head(15). That would work. For now I only have 3 columns, but more will come. I need to transfer them all to the new df. Thanks Ed. — Morten, May 23 '15 at 13:59
Mike I think that I have provided a good example with both my code and the output I am seeing + explaining what I was expecting and the delta between these situations. How may I be more clear. I have no idea as to how to be more specific than this. Please explain... — Morten, May 23 '15 at 14:06
I'm sorry but you need to fix this quite a bit before anyone can help you. For example, `for eachP in perf_array: df_res.loc[len(df_res)+1]=[strategy, ticker, strat, stratROI]` You don't even use `eachP` in the loop, and aside from that, I can sort of guess what you are trying to do there, but can guarantee that the code is not doing what you want it to. — JohnE, May 23 '15 at 14:19
Anyway, you need to do this: provide sample data with only as many rows and columns as necessary to ask your question (probably 3 columns and 5 rows is plenty). Then show your code start to finish (again as few lines as necessary) and I'd avoid wrapping any of it in functions. — JohnE, May 23 '15 at 14:21
@MortenE -- the way you can be more clear is to have a complete example. Your code is filled with variables/arrays/dataframes that are not defined. **No one else besides you** can run this code unless you provide sample data for all of those. This is what joris kept telling you in the other question and why it ended up getting closed. — JohnE, May 23 '15 at 14:28
Good post for how to ask a good pandas question: http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — JohnE, May 23 '15 at 22:53
@JohnE, Thank you for the link to a good SO post. It is inspirational. I will try and be more brief and make reproducable code, so it is easy to guide me better in future posts. Thanks again for the help and inspiration. — Morten, May 25 '15 at 10:01

score 0 · Answer 1 · answered May 23 '15 at 18:37

I've tried to reduce your code to make it more readable:

 1. def saving_res_pandas():
 2.     cols = ('Strategy', 'Ticker', 'Strat', 'ROI', 'Sharpe R', 'VaR')
 3.     df_res = pd.DataFrame(columns=cols)
 4.     for _ in perf_array:
 5.         df_res.loc[len(df_res) + 1] = [strategy, ticker, strat, stratROI]  
 6.     # Select the top x of all results (ticker/strategy combo) into new df.
 7.     df_max_res = df_res[:15]           
 8.     df_max_res.to_csv('df_performance_data_sp500ish.csv')
 9.     print('After analysing {0:.1f} Years ~ {1} workdays - {2} strategies and {3} tickers' '\n'
10.           'The following matrix of tickers and strategies show highest ROI: '  
11.           .format(years, days, len(strategies), len(stock_list)))
12.     return df_res

Based on the code above, I have two questions:

On line 5, how are the values obtained for strategy, ticker, strat and stratROI
On line 7, you are taking the top 15 items of df_res, but the DataFrame has not been sorted. In your original code, the sort line below was commented out (hence why I removed it in my edit).

df_res.reset_index().sort(['ROI', 'VaR', 'Sharpe R'], ascending=[0,1,0])

When you say you want the 15 best performances, on which metric (ROI, Var, Sharpe, etc.)?

Hi Alexander, Thank you for your efforts. Q1 on line 5 I am using following vars: strategy (opbtained from the iteration itself, so starting with strategy0 ending on strategy3, ticker (same thing from where it is in the iteration from 560 diff company tickers, Return-On-Investment (ROI) is the interesting part that where the actual performance of the prior 2 vars (strategy and ticker combined) ROI is a percentage. Line 7: And the df_res should be sorted from this result prior to cutting out the df_max_res with the top 15. Sharpe R and VaR can wait. — Morten, May 24 '15 at 17:21
OK thanks. It's almost there. With you edits I am now getting almost the proper information saved. Only hickup is that it saves only from last run of iterations. So somewhere I am not getting saved pr iteration run. This is my output: 'After analysing 1.2 Years ~ 302 workdays - 4 strategies and 2 tickers The following matrix of tickers and strategies show highest ROI: Unnamed: 0 Strategy Ticker ROI 0 1 aa 1.082333' — Morten, May 24 '15 at 17:44
This means that the pertinent info is actually getting saved (strategy name, ticker, and its ROI). Now if it would just save all of the iteration run results each on a new line of the df - that would be swell ! — Morten, May 24 '15 at 17:46

Morten · Accepted Answer · 2015-05-26T12:47:25.047

Finally I managed to come up with the right answer to my worries.

I solved it this way:

Before the for loops:

# Creating the df that will save my results in the backtest iterations
cols = ('Strategy','Ticker','ROI')  # ,'Sharpe R','VaR','Strat'
df_res = pd.DataFrame(columns = cols)

Inside the for, and nested for loops

def saving_res_pandas():
    global df_res, df_max_res
    df_res = df_res.append({'Ticker':ticker,'Strategy':strategy, 'ROI':stratROI,}, ignore_index = True)

    return df_res

Outside and after the for loops:

        df_res = df_res.sort(['ROI'], ascending=[0])
        df_max_res = df_res.head(15)           # Select the top x of all results (ticker/strategy combo) into new df
        # saving to a csv #
        df_max_res.to_csv('df_performance_data_sp500ish.csv')

    print('After analysing %1.1f Years ~ %d workdays - %d strategies and %d tickers' '\n'
    'The following matrix of tickers and strategies show highest ROI:' %(years, days, len(strategies), len(stock_list))
    )
    print()
    print(df_max_res)

Thank you for all your help and inspiration.

Save vars pr iteration to df and when done save df to csv

2 Answers2