1

I need to store a lot of demographic data that I am collecting for my research in database. I am trying to find out what would be a good solution to store data such as this:

http://factfinder2.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_11_3YR_S0201&prodType=table

My goal is to store all this data in such a way that I can retrieve sections of data when I need. The section of data I would be interested in retrieving is "Sex and Age" part of data or "Households By Type" part.

I am new to designing database tables and such. Could some one guide me in this regard?

Clarification:

  • My problem here is not that the data is huge. The problem I am having is that the data I am collecting is very diverse. I am wondering if I should spread each of those in different tables. Should I have one table for the "Sex and Age" category and another for "Household by type" category?
  • And another issue is metadata information. I would need to know information that classifies what is the source of this data set etc.
  • The data points itself may not run into millions at this point. I might have the need to store this information at the state level. My priority would be that the data needs to be easy to retrieve. Storing information from one page like the one in the link above across multiple tables is that it would be quite tricky to put the data back together. Also remembering what went into what table wouldn't be easy.

Has anyone else had to store data like this? What solution worked for you best?

Community
  • 1
  • 1
Goutham
  • 2,049
  • 3
  • 15
  • 19

1 Answers1

2

Although, your question is fairly vague, I guess by 'large', you mean few million rows. If your data-size is limited to such figures, you can simply design a database and dump the data. Since you are also tagging OLAP for the purpose, you may be interested on creating OLAP cube or something similar. You have to collect the data, may be in flat txt file, CSV file, excel file or anything similar. You can also use bcp to export/import bulk data. Please refer to this link. Furthermore, you can dump those data to SQL Sever (or any other that you like). Create table(s) and then if you like to create a OLAP cube, create dimension and fact tables. Create an OLAP Cube. Whenever you need data related to your point of interest, do browse your cube. If you are interested in data retrieval only, then you can query your relational database easily. I hope this will somewhat helpful to edit your question.

Sabin
  • 11,662
  • 3
  • 25
  • 39
  • A "few million rows" is not large. Not by any measurement I know. A few billion rows is large. – Mitch Wheat Jun 03 '13 at 03:02
  • Yes, if it is about billion of rows, still you can use SQL Server. However you need to consider more things like Scalability, Recoverability, Reliability and Availability. I think the following [post](http://stackoverflow.com/questions/2794736/best-data-store-for-billions-of-rows) is helpful. – Sabin Jun 03 '13 at 03:14
  • You're missing the point. I'm well aware of SQL Server's capabilities. Your post states "I guess by 'large', you mean few million rows.". A few million rows is nothing. It's so commonplace that it is hardly worth mentioning. – Mitch Wheat Jun 03 '13 at 03:29
  • @MitchWheat Thank you for answering my question. I have edited my question to clarify some points. The issue I have is not that there are billions of data points. I have lots of diverse data each with small amounts of data (~10000 data points). – Goutham Jun 03 '13 at 03:32
  • @Mitch Wheat Sorry for the confusion. Now that the question is clear, my answer is nowhere close to the solution. Should I remove my answer? – Sabin Jun 03 '13 at 03:44
  • I am sorry about not explaining properly. Its partly due to the fact that I am quite new to using databases. I wasn't using the correct terminologies, I guess. I want to get a feel for how experts would approach this problem. – Goutham Jun 03 '13 at 03:49