2

I would like to know what are the best practices for building Predictive Modeling solutions organically ? Some of the questions I have are :-

  • If I have multiple R model files, what are efficient ways of storing them ?
    • Save as .Rdata files on file system
    • Serialize to a DB as binary objects
  • Since data is processed to create an interim model specific format, is it helpful to use such paradigms as PMML ?
  • Also, should one consider such practices as MVC (I'm not a trained software developer, so any insights into such development practices would be very helpful)

I apologize for the open-ended nature of this question. I wish to understand even simple things as recommended folder structure for data staging, model store, scripts collection and such other elements of a data mining solution.

I would be very grateful to members of the community for sharing their experiences and recommendations. Thank you for your time.

harshsinghal
  • 3,720
  • 8
  • 35
  • 32
  • 1
    Regarding work flow - see http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing – Chase May 04 '11 at 12:06
  • 1
    From statistical point of view, you better check elsewhere. From a programming point of view, following questions are also covering this topic : http://stackoverflow.com/questions/1266279/how-to-organize-large-r-programs , http://stackoverflow.com/questions/2712421/r-and-version-control-for-the-solo-data-analyst , http://stackoverflow.com/questions/2860314/essential-skills-of-a-data-scientist , http://stackoverflow.com/questions/3097598/version-control-for-one-man-project-using-eclipse , ... – Joris Meys May 04 '11 at 12:20
  • To the OP: This is a pretty good question as questions go in general, but giving a useful detailed response takes a lot of effort. That makes it a difficult question to do on SO. – Iterator Aug 03 '11 at 21:42

0 Answers0