SQL design pattern: which one to choose?

Question

Concept:

I have a SAAS application. I have like 2-3 customers, more to follow. Each customer has his own database and our server connects to them depending on login information. On each of these databases there is a table, let's call it saas_table, which contains a unique column: id. There are and will be 3000 rows on each of these table at most.

My server has a database with one table: subscriptions. This table holds information about my customers, their database credentials and so on. I want to duplicate the id of the saas_table from each customer database and store it in my server database.

Approach?

Which is the better approach: creating a giant table log_table on my server's database and store the records there, or creating a table for each customer log_1_table, log_2_table, etc and store the information based on each customer? I must point that the hypothetical log_* table will be accessed very often, at least twice per session.

I think storing the ids on separate tables will have higher performance as tables will be cached in RAM. But this requires more programming and possibly new core version of my application. On the other hand, since each customer has traffic, the records from log_table will also be accessed often thus cached in RAM as well. Which is the better approach?

This is a faq. Before considering posting please always google any error message or many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings/names & site:stackoverflow.com & tags; read many answers. If you post a question, use one phrasing as title. See [ask] & the voting arrow mouseover texts. — philipxy, Feb 20 '20 at 09:46
There's no such thing as "better"/"best" in engineering unless *you* define it. Also unfortunately all reasonable practical definitions require a ridiculous amount of experience with a ridiculous number of factors that interact with chaotic sensitivity to details. Make straightforward designs. When you demonstrate via measurement that a design and all alternatives you can think of have problems (whatever that means at the time), then ask a very specific question. Which should also define "better"/"best". [Strategy for “Which is better” questions](https://meta.stackexchange.com/q/204461) — philipxy, Feb 20 '20 at 09:48
[Single or separate databases for separate customer accounts?](https://stackoverflow.com/q/4449621/3404097) Etc etc. — philipxy, Feb 20 '20 at 09:52

Caius Jard · Accepted Answer · 2020-02-20T09:56:31.090

Personally, I would never store in a tablename, information that should be stored in a column - and I think this is information that should be stored in a column. Accessing a table twice per session is fairly low usage, even for thousands of customers. The last stock trading platform I worked on would see customers hitting many tables several times per transaction, and there were hundreds of thousands of transactions per hour and a person's trading standing was calculated by looking back over billions of transactions and summing to find their current position. Your pondering about performance right now is a premature optimization, and it's distracting you from the bigger problem that you're considering creating an engineering and maintenance headache.

Chiefly, you're overlooking the fact that most ORMs and data access strategies by higher level libraries like Entity Framework is geared towards treating a table like an compiled-in object type that contains variable data (rows) of fixed proeprties (columns). Having your proposed structure of a table per customer (of what is identical information) creates a headache like:

DateTime recentLogin;
if(currentUser == "IBM")
  recentLogin = dbContext.IBMLoginLog.Max(x => x.LoginDate);
elseif(currentUser == "Microsoft")
  recentLogin = dbContext.MicrosoftLoginLog.Max(x => x.LoginDate);
elseif(currentUser == "Facebook")
  recentLogin = dbContext.FacebookLoginLog.Max(x => x.LoginDate);
...

This app needs recompiling every time you add a new customer. It should have been like this to cope with the things you anticipate to change):

DateTime recentLogin = dbContext.LoginLog.Where(x => x.User == currentUser).Max(x => x.Logindate);

You might say "but I can do the same with strSQL = "SELECT MAX(date) FROM " + currentUser + "LoginLog" but that's an accidental byproduct of the fact that your SQL is compiled from the string every time it's run and can hence cope with the changing table name - it's not an indication that it's a sensible or good way to design a program, in the same way that you wouldn't write some C# program, that wrote a text file containing C# to disk, compile it and run it, just to change the name of the table you're looking in:

string csharpCode = "...; DateTime recentLogin = dbContext." + currentUser + "LoginLog.Max(x => x.Logindate); ..."
File.WriteAllText(@"c:\temp\getdata.cs", csharpCode);
Process.Start("csc.exe", @"c:\temp\getdata.cs");
Process.Start("c:\temp\getdata.exe");

(It's a stupid example; no-one would do this - except that I'm advocating that by stringing your SQL together and sending it to the SQL Server, that is exactly what's happening)

Instead, take a cue from things like Entity Framework and even SQL itself in that only certain things can be parameters:

--valid
SELECT MAX(LoginDate) FROM Logins WHERE Client = @clientName

--not valid
SELECT MAX(LoginDate) FROM @clientName+Logins

--valid, but again, that recompiling thing:
EXEC 'SELECT MAX(LoginDate) FROM ' + @clientName + 'Logins'

Have your one table, index the client name (and maybe include other columns you'll frequently want the data of, like here I might CREATE INDEX whatever ON Login(ClientName) INCLUDE (LoginDate) to create an index that can be looked in for client name, and also the index knows the login date, so it can answer the max(logindate) query without the server needing to then hit the table and retrieve the rows it found from the index, to get the date

SQL design pattern: which one to choose?

1 Answers1