I have a lot of experience with sql server and as the years have progressed, I have gotten much picker about my relational design. I enforce referential integrity whenever possible. However, I'm running into a pattern in my current project that seems to defy referential integrity, or at the least, cause me to have to write a lot of extra code to allow sql server to enforce it.
For example, I have a comments table:
-
CommentID int identity (the pk)
UserID int (ID of the user who made the comments)
DateCreated smalldatetime (date it was created)
CommentsText ntext (the comments)
-
This is all fine and good, except I need to relate this table to a multitude of other tables. If I enforce RI, I have to make a seperate column for each entity the Comments table might be related to, and allow that column to be null. For example, my Comments may be related to tables such as Project, Contract, Ammendment, FundingRequest, and many others. So my Comments table gets "cluttered" with a bunch of nullable foreign keys:
ProjectID int null
ContractID int null
AmendmentID int null
FundingRequestID int null
etc...
-
this requires adding code every time a create a new entity that needs comments related to it. Alternatively, I could make seperate junction tables for each relationship, such as a ProjectToComment table with commentID, projectID and so on for each table, but that seems worse.
A guy here suggested that I turn off RI for these relationships to my comments table, and give it a CommentTypeID which related to a CommentType table with ID/Name columns. The CommentType table would hold what I'm related the comments to: Project, Contract, etc. Then the Comment table would also have GenericID column, which would be a foreign key to either the project, contract, ammendment, etc. tables. I would have to turn off RI, but could enforce a semblence of RI using a trigger, which would check the CommentTypeID, then check one of the tables the comments could be related to make sure the foreign key exists in one of the parent tables. So my final schema would look like this:
-
COMMENT
CommentID int identity (the pk)
UserID int (ID of the user who made the comments)
DateCreated smalldatetime (date it was created)
CommentsText ntext (the comments)
CommentTypeID int not null (the entity type it is related to: Contract, Project, Amendment, etc.)
GenericID int not null (an FK to one of the above tables, based upon the CommentTypeID)
-
-
COMMENTTYPE
CommentTypeID int not null identity (the PK)
CommentTypeName nvarchar(50) (the name of the table i represent: Project, Contract, Amendment, etc.)
-
This sounds like a clean solution that requires a lot less code, still keeps data from getting corrupt and allows approriate queries and reporting. It seems to violate Referential Integrity in a sense. I'm basically genericizing certain relationships for my own sanity.
Is there another option that I'm missing here? I must add that I'm seeing this issue repeatedly in my code, where a single table might be related to many other tables that may or may not be related to each other.
Thanks,
Fregas
I typically design this using a comments table per type. If you want to provide unified view, it is always possible to do that via view on top of these tables. Each approach however has it's pros and cons. In SQL Server 2005, you can also consider storing the comments as XML value in each table mapping to same schema collection. This however does not provide the capability to edit say comments independently. So you need to consider the requirements for your application, what type of operations you want to support, what type of queries you would require, do you need aggregated view of the data and so on. The answer is that it depends and there is no one way to go about this.|||
You can do one of three things (well, you can do a million things, but here's the ones I've used in the past)...
1. If there is only one comment per row in each of the child tables, put the CommentID in each of the child tables.
2. If there can be more than a single comment per child table row you can:
A. Create a uniqueidentifier for each of the child tables, and create a single junction table that contains the unique identifier of the child table and the CommentID field. This way you can relate each child table to the junction table, but you don't need to create a hundred junction tables.
B. Create a third table called CommentConsumers and add a CommentConsumerID field (the PK from the CommentConsumers table) to each of your child tables that will use (consume) the comment table. Upon inserting a row into a child table, a new row is added to the CommentConsumer table, and the ID is stored in the child table. Create another table that is a junction between a CommentConsumer and Comments table (contains a CommentConsumerID and CommentID). That table will tell you which child owns which comments with full RI and without using unqiueidentifiers.
I prefer 2.B. in my designs.
|||Fregas,
While I'm more asleep then awake right now... You basically had your answer before you asked it here. It sounds like you're someone that likes to get multiple uses out of your tables; no need creating multiple tables if you all ready have a table that has the schema you all ready need right...
What I do in this regards is assign each interface or application an unique ID (interface_id) and I store this ID in these tables. This then allows you to create reports and relationships etc as you so choose. Also keeping a generic name (interface_id) keeps in simple...which is one thing I learned a long time ago and tell myself often... KISS, Keep It Simple Stupid. Not directed at you at all, but something I tell myself as often as I can.
Oh yea, the CommentType table you mention above becomes an interface table which also helps me define security for example and many other interface or application type things. Each interface (or application) has an unique ID.
Hope it helps.
On another note, I often stay away from identity fields. You can get the next ID with select max to use with your inserts.
|||wkb wrote: On another note, I often stay away from identity fields. You can get the next ID with select max to use with your inserts.
If you do that, you'll definitely need to do it in a transaction. The isolation level on the transaction will have to be serializable (or similar) so that another operation doesn't grab the same SELECT MAX(ID) value, and that will lock the table up. Hopefully the ID field isn't also your clustered index.|||This is true and hopefully went without saying. If it where the clustered index, why would it matter?|||
wkbia wrote: This is true and hopefully went without saying. If it were the clustered index, why would it matter?
Because it would hotspot the last data page (i.e. since the pages are now laid out physically in numerical order via the clustered key, all of the inserts are going to be at the end of the table. In general, for a table in which you will be doing a lot of inserts, you want to make the clustered key something that will cause the inserts to happen in a way that distributes the data somewhat evenly over the data pages). You almost never want the clustered index to be your autonumber field.
The reason I mentioned that stuff about making one's own identity field re: transactioning is because if you use a SQL Identity field, you don't have the problem of locking the entire table you're inserting into nor do you have the extra read operation, nor do you have to be careful about duplicate values and therefore do not have to possibly debug mistakes with the algorithm to generate the ID. I notice on occassion people have a problem with identity fields, however, I happen to love them; maybe I'm missing something? If I am missing a down-side, please let me know.
No comments:
Post a Comment