Background:
I am new to Python and programming in general. I want to know whether my methodology for implementation is correct, or whether it should be done a better way.
Data:
I have a csv, with around 400 types of stocks i.e.
Date, SH Ltd, Date, QS Ltd, Date, WX Ltd ...
26/02/18, 34, 16/06/13, 5634, 15/06/17, 100
27/02/18, 33, 17/06/13, 5763, 16/06/17, 100
28/02/18 ,35, 18/06/13, 6139, 17/06/17, 100
...
So every first column has some random start date but ALL dates will end as of yesterday. Now lets say I want to do 3 things,
1) Calculate Vol for last 252 days 2) Calculate the worst three 2 days gap of east stock.
My Approach
I am currently thinking I loop through each column of the CSV, create a time-series vector. Then I can run a method on it to calculate the 3 day gaps for start date + 2 till today. Then create a vector, sort this vector from big to small and spit out the 3 smallest. Then I take the last 252 days, work out stdev and multiply by the square root of 252.
After this I have 2 outputs
1) A vector of worst 2 day performances (5 worst) 2) A number for the last 252 days of vol.
At this point I put this data into a csv for 2D perf and a volatility csv. Then I just continue to loop through every other column until it has data and add to the csv files above.
My Question:
Is this a very inefficient method? And does this continual use of multiple vectors slow my program down heavily compared to using a single vector for just one calculation?