Avoid bugs later on by using appropriate joins

Wednesday, April 10, 2013

Avoid bugs later on by using appropriate joins

One of the most fundamental development principles is that it isn’t good enough if code works for the current situation; you must also consider how to design it so future situations don’t cause it to break. A common T-SQL practice that can lead to future errors is to divide tasks into several queries to approximate a set theory operation—that is, to avoid using a join.

Let’s suppose for example, that our database stores products, and each product is assigned one product type. Let’s say, furthermore, that the code has access to the product type name and the product type code, and you want to see if there’s actually a product with that code with a product type of that name. The required information resides in two tables, as shown in Tables A and B:

Table A: Partial structure of Product_Types table
Column             Data type
Product_Type_ID    int
Product_Type_Name varchar(50)

(Additional columns not shown)    …

Table B: Partial structure of Products table
Column             Data type
Product_code       int
Product_Type_ID    int

(Additional columns not shown)    …

A deceiving shortcut

Naturally, you might consider using a join. But—come to think of it—figuring out the correct join logic may seem a pain. So sometimes, developers break up the logic into more than one query, using T-SQL variables. Listing A shows one example of how you might accomplish this.

Listing A: Two-step stored procedure to determine if a product code exists in a product type
CREATE PROCEDURE
usp_Verify_PCode_Per_PType
@ptype varchar(100),
@pcode int
AS
declare @ptypeid int
select @ptypeid = product_type_id
from product_types
where
product_type_name = @ptype
SELECT ISNULL(
(select 1 from products
where product_code = @pcode and
product_type_id = @ptypeid),
0)

The code processes very straightforward steps:

It finds the product type ID from the product type name. It inserts this value into the variable @ptypeid.
Using this variable, it queries the products table to see if there’s a record with the given product code and product type ID.
The procedure circumscribes the second query into another query which uses the ISNULL() function to convert the value to 0 if there’s a NULL result (i.e., if there are no records). In this way, it returns 1 if it found a matching product, and 0 if it didn’t.

Can you see what’s wrong with this picture? Hint: We didn’t tell you whether the table enforces uniqueness for product type names.

Multiple values inserted into a scalar variable

So, what happens if the product type name belongs to more than one product type? That’s an interesting question. For example, in the following code, @ptypeid is an int variable:
select @ptypeid = product_type_id
from product_types
where
product_type_name = @ptype

The code appears to assume that the SELECT statement will return only one value, but it doesn’t fail if there’s more than one. Instead, SQL Server picks the last value to insert into @ptypeid. In other words, suppose the query returns two records:
3
20

In this case, the value of @ptypeid will be 20. That’s really too bad, though, if our product actually belongs to product type 3. If so, the stored procedure won’t find the match, and will return 0.

The solution: Use joins
The solution is, of course, to just use a join. If you want to know if there exists a product with a certain product code that has a product type of a certain name, you really only need a single query:
SELECT 1 FROM products p,
product_types pt
WHERE product_code = @pcode
AND p.product_type_id =
pt.product_type_id
AND product_type_name = @ptype

To revise the original procedure to use such a query, you can use code similar to Listing B.

Listing B: Code to make the original procedure more robust
ALTER PROCEDURE usp_Verify_PCode_Per_PType
@ptype varchar(100),
@pcode int
AS
SELECT ISNULL((SELECT 1 FROM products p, product_types pt
WHERE product_code = @pcode
AND p.product_type_id = pt.product_type_id
AND product_type_name = @ptype),0)

T-SQL variables still OK

We’re not arguing, of course, that there’s anything wrong with storing values from queries in T-SQL variables. Obviously, this is an essential technique. But it’s still important to recognize when circumstances call for a join.

Pages

Wednesday, April 10, 2013