Client complaints about inconsistent query results


You've probably seen it before: a client has called the help desk and reported that a couple of queries have seemingly inconsistent results. You're called in to test the queries, so you start digging. You run two queries, and the results for one query don't seem right, considering the other query's results. So you start to wonder, "Is it my understanding of the schema?" You then start looking at the data model and the tables seem wonderfully designed. Your query should work. Dang, now what? You dig deeper, and realize... the only objects you find in the database are tables. No constraints, no relationships, no nothing.

A quick query later and you find that orphaned rows in a child table show up in one query but not another. At this point it's clear that almost all of the data in these tables is suspect because if you can't trust that a simple relationship works, can you trust that the child rows that exist are for the proper parent? Sadly, the answer is no, because the software may have created a new parent row that reused a key value. Now ideally you'll test the consistency of the data every time you use the data, to verify that it's correct. It doesn't take a genius to determine how long before that gets old. So how do you avoid this calamity? Protect the data using all of the tools you have available. In this article, I'll present many of the tools that SQL Server gives you to help protect your data from corruption, usually in a behind-the-scenes manner that the client doesn't really have to worry about. I'll also present guidance on how to decide what type of protection and where, and give examples of each type of protection. I'll break down the topic into the following two parts: protection tools - introducing the tools that SQL Server gives us and techniques for using them; when and why to use what tool - examining the different types of protection tools and discussing when and where they should be used

The goal will be to protect the data such that it's guaranteed to be fundamentally correct, meaning that completely illogical or impossible data isn't allowed. For example, consider a column that holds an employee's salary. Logically, it'll be a number value, and should probably have a reasonable upper and lower bound. What those bounds are is up to the individual situation, but the minimal goal is to make sure completely illogical data is never stored. Without any real concern, you can set a lower bound of 0 on a salary, as by definition a salary is pay, not a fee to come to work (that is what timesheets and staff meetings are for). You might use a numeric(15,2) data type to establish an upper bound of 999,999,999,999.99 for the salary value (even the CEO of Enron never made that kind of money).

The application code will be used to warn users that the average salary for a data architect is less than 200 million dollars... but if you want to pay that much, I'll take it. The final result is that there's no way that the value too freaking little will show up in the salary column. Here I'll introduce the different tools available for you to protect the quality of your data. Understanding the tools at your disposal for protecting data quality is the second most important thing to know and know well (the third being how to write set-based queries, and the first being normalization). This is about automatic data protection, meaning tools that can be used to seamlessly protect the integrity of the data without any further coding effort or user action to make sure that they work, and no method of overriding the protection without cheating (you can disable some protections, and you can get around them using bulk copy methods). You can also take manual steps to protect your data; they generally require the client to adhere to a given API to work. This could include stored procedures, client code, defaults, and so forth.

The point of using manual data protection techniques is that they enforce rules that can be overridden or applied situationally, or that are frequently changed. These rules are generally apt to change, so you wouldn't be able to easily write a query to find data that didn't meet the rules over a long period of time.

Legal Disclaimer

Our website is not responsible for the information contained by this article. Articleinput.com is a free articles resource thus practically any visitor can submit an article. However if you notice any copyrighted material, please contact us and we will remove the article(s) in discussion right away.

Note: This article was sent to us by: Steve Miller at 01072010

Related Articles

1. Python functions handle variable numbers of arguments
Python functions can also be defined to handle variable numbers of arguments. You can do this two different ways. One way handles the relativel...

2. Lambda expressions and generator functions in Python
Short functions like those you just saw can also be defined using lambda expressions of the form. Lambda expressions are anonymous little funct...

3. How to create a basic program in Python
Up until now, you've been using the Python interpreter mainly in interactive mode. For production use, you'll want to create Python programs or...

4. How to make Python script execution in UNIX and Mac OS X and in Windows
If you're on UNIX, you can easily make a script directly executable. Note that if Python 3.x isn't your default version of Python, you may need...

5. The difference between scripts on Windows scripts on UNIX
The way you call scripts on Windows differs from the way scripts are called on Linux/ UNIX, and that difference can affect what kind of scripts...

6. Python applications are distributed as source files
You can distribute your scripts as source files (as .py files). You can also ship them as byte code (as .pyc or .pyo files). A byte code file w...

7. How to use the Python file system
Working with files involves one of two things: basic I/O and working with the filesystem (for example, naming, creating, moving, or referring t...

8. How to ensure data quality
The most fundamental thing you can do to ensure the quality of your data is to choose a reasonable container for the value being stored. Want t...

9. How to use a foreign key constraint
A foreign key constraint is used to make sure that columns involved in a relationship contain only correct data. I start out the article with a...