Backstory* I have recently switched from using excel to produce models of predicting the chance of being diagnosed with a particular cancer. The model was produced in an excel file and grew in both size and complexity, I made use of excels solver platform to iterate through simulations, the file achieved a size 500mb+, essentially I was starting to cross over into the realm of 'big data'.*
My question to the stack overflow community is, what is the best methodology for continuing this research. My hunch is that storing the data in a database and calling each parameter for individual analysis is a possibility. My old excel methodology used non linear regressions of each parameter (from historic data) Enabling the calculation of a percentage chance of acquiring said cancer (specific to that individual parameter), the algorithm used then weighted each parameter to achieve a final score from which I would perform a logistic regression in order to calculate the chance of a persons achieving said cancer.
Any suggestions, comments, pointers and constructive criticisms would be greatly appreciated, I have recently made the switch from excel to python to continue in this work, Kind regards AEA