How to use a foreign key constraint


A foreign key constraint is used to make sure that columns involved in a relationship contain only correct data. I start out the article with an example that contains a foreign key reference. The reason for this is because it's probably the most frustrating of errors; it's rarely a problem in most databases, so it's not the first thing that will pop to mind. If you don't take the time to add the foreign key constraint, then the value for order- Number in the orderItem table isn't bound to any domain of values. It could be "Hello!" for all that matters, even if that value didn't get created in the order table. The problem is that no matter the intentions of the non-data tier implementers, it's difficult to test for all possible scenarios, and impossible to stop a DBA from messing up the data by accident. A good example of the kinds of problems you can run into is a query against a child, such as an invoice line item. A query against this table should give consistent results whether or not you include the invoice in the query. On the surface, this seems a safe bet, as why would you put data in invoiceLineItem without a matching value in invoice?

But unless you've created a constraint between invoice and invoiceLineItem, you could have values for invoiceLineItem.invoiceId that aren't in the invoice table. These values would be excluded in the second query, causing the user to have to think about why this is true. Simply adding these values could prevent a lot of heartache trying to decipher the random bits and bytes from the meaningful data. Even worse, note that you'd also be required to check for bad data like this in any program that uses this data (such as ETL for your warehouse), or you'd corrupt the sources of data that use this data, requiring even more data cleanup. Check constraints allow you to define, for an individual column or row, a condition of the data that must occur. You use them to constrict the set of values to a certain domain. For example, if you want to make sure that the values could only be entered into the status column of the invoice table, you'd define a condition that said status in ('active','inactive'). No matter how many rows you modify in a statement, the condition specified by a constraint is checked one row at a time, and can't see which other rows are modified. You can access the columns in a table by name with no alias, or you can access data in other rows, or even tables using user-defined functions. The Boolean expression can't directly reference another table, but you can use a function to access the data in the table. Note that I said you can't see which other rows were modified, but the data is already in the table, so you can access it in the function's query.

One example of using a function is limiting the cardinality of a value in a database (or the count of the times a value can appear). Using a unique constraint, you can easily limit the cardinality to 1, but if you want to allow a number different than 1, there isn't an easy way to do this declaratively. So we can use a function and a check constraint. If we want no more than two of the same values for the value column, you can build a simple function. It counts the number of values in the table that match the @value parameter. Constraints see the data in the table from the current DDL statement. Now you should be careful to test the statements. After the third one, you'll get an error. After creating a constraint that accesses other tables, you should test all of the possible types of combinations of rows you may get. For example, we tested the case of sending one row at a time, but in this case you should test modifying multiple rows in a single DML statement to make sure that it behaves correctly. This again fails with the same error message. (I'll leave it to you to test the UPDATE scenarios.) This leads to the most difficult problem with check constraints: terrible error messages. Check constraints have a downside, but they're important to the process of keeping the data clean of illogical data. Done right, your reporting and ETL process won't have to check for out-of-range values that can trip up calculations (and as mentioned, make users wary of trusting "your" numbers). Just like we did with the salary example, you should make judicious use of check constraints to prevent data from being fouled up in an unreasonable manner. For example, I generally put a constraint on all varchar columns to make sure that the empty string isn't inserted.

This isn't to say that I always enforce this sort of constraint, as sometimes allowing an empty string is a good thing, because it allows you to differentiate between no value and a NULL value. But, when a column value is required, it's usually not the best plan of attack to allow the user to enter a single space character in lieu of a "real" value. And for unique indexing purposes, I really don't want strings containing different numbers of spaces to be treated as different values; users sometimes use this technique to get around putting in reasonable data values (though you can't stop them from putting in gibberish for values, at least not easily). One last note: when you create a CHECK constraint, you can specify WITH NOCHECK to tell SQL Server to not test existing data against the constraint. This can be a faster way to add a constraint, but generally speaking, it's not a good idea.

Two problems often come up. First, the constraint can't be used by the optimizer to know what data could exist in the column(s). Second, if there's invalid data, it'll fail if it's updated to the same value. UPDATE table SET column = column is something that should never fail, but untrusted constraints can leave you open to these kinds of spurious errors that the user interface probably couldn't even help prevent. You can determine whether your constraints are tested by looking at the is_not_trusted column in sys.check_ constraints.

Legal Disclaimer

Our website is not responsible for the information contained by this article. Articleinput.com is a free articles resource thus practically any visitor can submit an article. However if you notice any copyrighted material, please contact us and we will remove the article(s) in discussion right away.

Note: This article was sent to us by: Jeff Davis at 01072010

Related Articles

1. Definitions and uses of functions in Python
This article assumes you're familiar with function definitions in at least one other computer language and with the concepts that correspond to...

2. Python functions handle variable numbers of arguments
Python functions can also be defined to handle variable numbers of arguments. You can do this two different ways. One way handles the relativel...

3. Lambda expressions and generator functions in Python
Short functions like those you just saw can also be defined using lambda expressions of the form. Lambda expressions are anonymous little funct...

4. How to create a basic program in Python
Up until now, you've been using the Python interpreter mainly in interactive mode. For production use, you'll want to create Python programs or...

5. How to make Python script execution in UNIX and Mac OS X and in Windows
If you're on UNIX, you can easily make a script directly executable. Note that if Python 3.x isn't your default version of Python, you may need...

6. The difference between scripts on Windows scripts on UNIX
The way you call scripts on Windows differs from the way scripts are called on Linux/ UNIX, and that difference can affect what kind of scripts...

7. Python applications are distributed as source files
You can distribute your scripts as source files (as .py files). You can also ship them as byte code (as .pyc or .pyo files). A byte code file w...