Friday, March 16, 2012

"Schemas" in same database vs multple Sql Instances

We have a DataWarehouse project. We are considering using table names (in the same database) like this:

Acct.Payroll

...

Claims.Payroll

.....

Shipping.Payroll

...

where the "Schema" differentiates the "Payroll" tables. The tables have similar functions but are quite different internally. We expect these table to grow significantly over time.

Question: From a performance\ maximum database capacities, admin perspective what the pros/cons of each approach:

1. Schema approach

2. Multiple instances of sql (up to 16) on a single server

3.Seperate databases on the same server

TIA,

barkingdog

First, having different schemas is the simplest and "fastest" approach.

Second, would be seperate databases. There is very little overhead in accessing another database on the same server.

WAY, WAY down the list, about 412 on the performance scale, would be using seperate instances. Connections are very cpu and memory intensive as well as running the entire SQL server instance, etc. I would not recommend it for this application.

I would highly suggest using file groups and many hard drives to spread the usage in one database with multiple schemas.|||

If the assumed parts of you logical design are supposed to have some logical consistency between them, you would most probably need to use transactions that span over multiple functional partitions in your application. If so, then:

1. With multiple instances/databases you won't be able to use declarative referential integrity between the tables residing on different instances/databases.

2. With multiple instances you will be paying a much higher price of the full-blown 2-phase distribution transaction commit protocol for the transactions that span multiple instances. Although for multiple database the engine also uses a variation of a 2-phace commit protocol, but in fact it's very lightweight and does not actually incur a noticeable performance overhead.

3. With multiple instances/databases if the 2-phase coordinator becomes unavailable all the participating transaction on the other instances/databases that have prepared but have not receive the commit message will not able to make any progress and will need to wait until the coordinator comes back online. Quite often resolving situations like this requires manual intervention.

4. With multiple instances you won't have automatic deadlock detection - it might become a major administrative headache because you would need to 'detect' and 'resolve' the deadlocks manually.

5. With multiple instances/databases a sound backup strategy is more complicated to design and to implement.

Probably the only real benefit (besides somewhat higher availability if the setup is right) that one could get from going multi-instance is when a single machine, however powerful it is, is unable to handle the workload. If so then you really need a scale-out solution and in this case the disadvantages don't matter and you will have to pay the price.

Assuming that you properly partition your tables and/or design the placement of the tables in filegroups, the best possible option would be to use multiple schemas in the same database. Unless, of course, a single machine is unable to handle your workload.

Another possible exception is the case when certain database level options are not applicable to all partitions of your application, For instance – the Payroll would benefit from the row-level versioning, but the Claims for some reason wouldn’t. If you don’t want to pay the versioning overhead for the Claims then you would probably want to move it to a separate database on the same instance.

|||

From a raw performance perspective on a single database server, using schemas would be best. It also would be the easiest to develop and administer.

Using multiple databases in the same instance would perform fine, but would be a little more complicated, since your applications would need different connection strings. Backups/restores would be quicker.

With multiple instances on a single database server, the available RAM will be divided between all of the instances. You will also have to deal with different connection strings.

If you find that your single database server cannot handle the workload in the future, and you don't want to scale up, you would want to have these three tables on separate databases on separate servers to scale out instead.

No comments:

Post a Comment