18

All, I have to create a single database which has a basic schema. This database is going to be large (100GB plus), and is to be used as a data warehouse. Now, the creation of this database is currently being performed in 'one-hit' through a C# code; pulling data from a number of different sources. Due to the sheer volume of data, this creation is causing some problems. It has been decided that instead of creating a single database in one go; to

  1. Create several smaller databases containing the core data tables.
  2. Merge the smaller databases into a single larger database.
  3. Build the schema/add the relevant constraints.

My question is two-fold.

1. What is the best way to merge multiple databases (with identical schemas/table structures)?

2. Is there a better way of going about creating this data warehouse (extremely large database)?

Thanks for your time.

MoonKnight
  • 23,214
  • 40
  • 145
  • 277
  • 2
    Why does creating one large database cause problems, and why wouldn't creating many small databases cause the same problems? – Tim Rogers Sep 17 '12 at 11:31
  • To create the 'large' database takes six hours+. The majority of this data to form this database is pulled from other databases (sometimes accross a network). Due to the amaount of data we can get hangs on the server instance creating this database. This is the reason for attempting to modulerise the proceedure, so when/if there is a drop out we donot end up with a corrupt database. – MoonKnight Sep 17 '12 at 12:01

3 Answers3

8

for your first question : You have mentioned identical schema and table structure, in that case, its simply moving of data from one DB table (i.e smaller DB) to another (i.e larger DB). for this have to ensure

1) there was no duplicate of data ( at-least in PK field )

2) move data from one db to another for sql server refer

Transfer data from one database to another database

Community
  • 1
  • 1
solaimuruganv
  • 27,177
  • 1
  • 18
  • 23
  • 1
    What to do if the PKs are not unique? In my schema one table has autoincremented ID as PK, and that field is used in another table as FK. – handras Jan 28 '19 at 16:44
7

SQL-Hub (http://sql-hub.com) will let you merge multiple databases with the same schema in to a single database. There is a free licence that will let you do this from the UI though you might need to pay for a license if you want to schedule the process to run automatically. However, if there are very big volumes of data it might prove a bit slow for what you want to achieve - you could try it out though.

There are also some articles here that look at some other options for these sorts of problem.

Scott Bennett-McLeish
  • 9,187
  • 11
  • 41
  • 47
Alan Hickman
  • 71
  • 1
  • 1
5

sola's answer is good and will work. Here is another alternative: Don't merge at all. Create a union-all view on top of the multiple base tables that your import creates. That saves you a lot of data movement. Such a view is called a partitioned view.

usr
  • 168,620
  • 35
  • 240
  • 369
  • I am aware of partitioned views, however, the data warehouse database must be stand-alone. This database must be provided to the client as a single database entity (possibly spread across multiple .ndf etc.). I need to be able to automate the process of the database create, thus the method suggested by @sola is not one I can utilise. Thanks for your time. – MoonKnight Sep 17 '12 at 13:33