-2

I am currently designing a Survey system (where a Survey has many questions, a question has many answers, and a Response belongs_to a user, survey, question and answer).

I will have a lot of demographic data in the User model and expect 100's of thousands of responses to various questions, etc.

Eventually we will want to analyze the responses, for example. 80% of males like bananas, 20% of females own a Ford and whatnot.

I am looking into statistical languages like R,SAS and SPSS, and am wondering if my data will need to be structured in any specific way in order to be used by these programs? Or do they all accept csv files?

Is there any advice that you have in terms of statistical data, and structuring data models for it?

Finally, how much does SAS, SPSS and Stata cost?

Metrics
  • 15,172
  • 7
  • 54
  • 83
Kamilski81
  • 14,409
  • 33
  • 108
  • 161
  • 2
    This question is a little too broad, especially the last part. If you design a good structured database to hold your information, I would suppose that any decent statistical package would be able to interface with it. I know R can, I would be shocked if SAS couldn't, don't know about SPSS. – Ben Bolker Jun 05 '12 at 21:23
  • I agree that any statistics package can handle this stuff. However, Stata data files are particularly adept at storing survey metadata. You can get ideas for how to replicate in R (or in a more cross-platform way) in these questions: http://stackoverflow.com/questions/5335745/how-do-i-handle-multiple-kinds-of-missingness-in-r http://stackoverflow.com/questions/7979609/automatic-documentation-of-datasets – Ari B. Friedman Jun 05 '12 at 22:08
  • You'll have to ask the companies how much their products cost. – mdsumner Jun 06 '12 at 01:12

1 Answers1

2

CSV files are more than enough. R is powerful to manage all your data arranged in rows and columns.

For example: You can arrange all columns of csv as Variables/Responses with headers and your rows could be data or vice-versa.

It doesn't matter as long as they are arranged in rows and columns. Comma, Space de-limited columns in CSV files can be easily handled. Not that I am specific, you can have any delimiter and R has powerful regular expression matching.

Only suggestion is you should just make different CSV files for different data-sets to make things easier and it could all be imported into a data-frame from CSV file easily.

Once you get it done, you are free to unleash the power of R

Subs
  • 529
  • 2
  • 9