Introduction
CAUTION! This article does not apply to all versions of SQL 2008, only to SQL 2008 SP1 CU5 (10.0.2746) and SQL 2008 R2 CU1 (10.50.1702) and later. See the section A Tale of Service Packs and Cumulative Updates for more details.
A very common requirement in an information system is to have one or more functions where the users are able to search the data by selecting freely among many possible criterias. This is a tough challenge, because not only must you produce the desired output, but you must also keep the response time within acceptable limits, at least for common searches. And on top of all, the code must be maintainable, so that you easily can modify it to meet new needs and requirements.
There are two ways to attack this problem: dynamic SQL and static SQL. Up to SQL 2005, it was difficult to find solutions for static SQL that were simple to write and maintain and yet performed well, and the recommendation was to use dynamic SQL. In SQL 2008, things changed. Microsoft changed the hint OPTION(RECOMPILE) so it now works as you would expect. However, there was a serious bug in the original implementation, and you need at least CU5 of SQL 2008 SP1 or SQL 2008 SP2 to benefit from this feature.
Although, as we shall see, a solution with static SQL is in one sense more dynamic than dynamic SQL itself. There are still some performance implications, and a properly written solution with dynamic SQL can still be the best choice when you expect many concurrent searches. Dynamic SQL also remains the best choice when you need to support very complex search options.
This article exists in two versions. This version is for SQL 2008 SP1 CU5 and later. The other version is for SQL 2005 and earlier as well for SQL 2008 SP1 up to CU5. That version includes various tricks to deal with the performance problems of static SQL solutions that no longer are an issue with SQL 2008. Therefore, I have not included these tricks in this version.
In this text, I first look at a fairly common simple case of dynamic search conditions, that I’ve called “alternate key lookup” where the more general methods shoot over the target. I then introduce a typical problem of general dynamic search conditions that serves as a case study when I later discuss the solutions for static and dynamic SQL in detail.
Here is a table of contents:
Introduction
Alternate Key Lookup
The Case Study: Searching Orders
The Northgale Database
Static SQL
A Tale of Service Packs and Cumulative Updates
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Conclusion
Feedback and Acknowledgements
Revision History
Alternate Key Lookup
Using IF statements
Problems with dynamic search conditions come in several flavours. In the general case, there is a search form where the user can select between very many search conditions, and this is also the main focus of this article. But sometimes you encounter problems where there are only two-three conditions. As a typical example, assume a form where a user can look up a customer by entering one of: 1) The customer’s name. 2) The customer number. 3) The customer’s national registration number. (That is, what is called SSN, personnummer etc. depending on where you are.) Dynamic SQL for a simple case like this is not worth the hassle, and OPTION (RECOMPILE) forces a compilation every time, which is not good, if there are many of these lookups every second.So for this problem, a simple-minded approach is preferable:
IF @custno IS NOT NULL
SELECT ... FROM customers WHERE custno = @custno
ELSE IF @natregno IS NOT NULL
SELECT ... FROM customers WHERE natregno = @natregno
ELSE IF @custname IS NOT NULL
SELECT TOP 200 ...
FROM customers
WHERE custname LIKE @custname + '%'
ORDER BY custname
ELSE
RAISERROR('No search condition given!', 16, 1)
(The TOP 200 for the search on customer name limits
the output, in case the user would enter a very short search string, so
we don’t return tens of thousands of customers.)If you need to return data from other tables as well, and you don’t want to repeat the join, you could use a table variable into which you enter all matching customer numbers, and then do your final join:
IF @custno IS NOT NULL
INSERT @cust (custno) VALUES (@custno)
ELSE IF @natregno IS NOT NULL
INSERT @cust (custno) SELECT custno FROM customers WHERE natregno = @natregno
ELSE IF @custname IS NOT NULL
INSERT @cust (custno)
SELECT TOP 200 custno
FROM customers
WHERE custname LIKE @custname + '%'
ORDER BY custname
ELSE
RAISERROR('No search condition given!', 16, 1)
SELECT ...
FROM @cust c
JOIN customers cst ON cst.custno = c.custno
JOIN ...
The assumption here is that all columns are indexed, so that there is
an obvious query plan for all three cases. However, recall that SQL
Server builds the query plan for a procedure the first time it is
executed, whereupon the optimizer “sniffs” the input parameters. This
means that if the first search is on customer number, SQL Server will compile the branches for customer name and national registration number for a NULL in their search parameters. Particularly in the case of the customer name, this could lead to SQL Server choosing a table scan which you have all reason to avoid.There are a couple of precautions you can take. One is to push the three SELECT statements down into subprocedures, but admittedly this is a bit bulky. Another approach is to add explicit index hints, but index hints is always something you should be careful with. What if someone renames the index? That would cause the query to fail. Instead, the best option is probably to use the OPTIMIZE FOR hint:
SELECT TOP 200 custno FROM customers WHERE custname LIKE @custname + '%' ORDER BY custname OPTION (OPTIMIZE FOR (@custname = N'ZZZZZZZ'))This hint causes SQL Server to build the query plan for the value you specify. All you need to do is to find a value which is selective enough.
Whatever strategy you choose, you should test on production-size data that you get the plans you expect. Due to the sniffing issue, your test should look something like this:
EXEC find_customer @custno = 123 EXEC find_customer @natregno = '1234567890' EXEC find_customer @custname = 'ABC' EXEC sp_recompile find_customer -- flush the plan for the procedure EXEC find_customer @natregno = '1234567890' EXEC find_customer @custno = 123 EXEC find_customer @custname = 'ABC' EXEC sp_recompile find_customer EXEC find_customer @custname = 'ABC' EXEC find_customer @custno = 123 EXEC find_customer @natregno = '1234567890'That is, you should test with all three parameters as the “sniffed” parameter when the plan is built.
In this particular example, there is one more issue with the @custname parameter: the user could add a leading %, in which case a scan would be a better choice. This far, I’ve tacitly assume that there is something to preclude this, for instance a check in the GUI. But say that users need to be able to search by substring within the customer name. In this case, the best is to split this into two branches:
IF left(@custname, 1) <> '%' -- query as above ELSE -- same query, but with different value in OPTIMIZE FOR.
Using OR
If you don’t like the multiple IF statements, you may be delighted to know that it is in fact perfectly possible that you can do it all in one query (as long as we ignore leading % in @custname):SELECT TOP 200 ...
FROM customers
WHERE (custno = @custno AND @custno IS NOT NULL) OR
(natregno = @natregno AND @natregno IS NOT NULL) OR
(custname LIKE @custname + '%' AND @custname IS NOT NULL)
ORDER BY custname
The WHERE clause here essentially reads:custno = @custno OR natregno = @natregno OR custname LIKE @custname + '%'I have added explicit tests on whether the input variables are NULL. This encourages SQL Server to go for a query plan that includes all three indexes and performs an index concatenation. The nice thing is that thanks to the conditions on the input variables, the plan will include filters with startup expressions, so that at run-time SQL Server will only access one of the indexes. Without the checks on the input variables, you may still get the index concatenation, but the risk for a plain scan increases. And in any case, there will be no startup filters, so all indexes are always accessed.
This strategy usually works well, as long as the search terms are all in the same table and all have an index, but rarely (íf ever) if the search terms are in different tables. In any case, you should test that you get the plan and performance you aim for.
The Case Study: Searching Orders
We will now turn to a more general case where there are many search terms. We will work with implementing a stored procedure that retrieves information about orders in the Northwind and Northgale databases, as well as the customers and the products of the orders. Northwind is a sample database that came with SQL 7 and SQL 2000 but it does not ship with SQL 2008. You can retrieve it from Microsoft’s web site. Northgale is a bigger database that I have derived from Northwind, more about it below.This is the interface that we expose to the user (well rather to a GUI or middle-layer programmer):
CREATE PROCEDURE search_orders
@orderid int = NULL,
@fromdate datetime = NULL,
@todate datetime = NULL,
@minprice money = NULL,
@maxprice money = NULL,
@custid nchar(5) = NULL,
@custname nvarchar(40) = NULL,
@city nvarchar(15) = NULL,
@region nvarchar(15) = NULL,
@country nvarchar(15) = NULL,
@prodid int = NULL,
@prodname nvarchar(40) = NULL AS
SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
c.PostalCode, c.Country, c.Phone, p.ProductID,
p.ProductName, p.UnitsInStock, p.UnitsOnOrder
FROM Orders o
JOIN [Order Details] od ON o.OrderID = od.OrderID
JOIN Customers c ON o.CustomerID = c.CustomerID
JOIN Products p ON p.ProductID = od.ProductID
WHERE ???
ORDER BY o.OrderID
You see in the SELECT list what information the user gets. Here is a specification of the parameters:| Parameter | Function |
|---|---|
| @orderid | Retrieve this order only. |
| @fromdate | Retrieve orders made on @fromdate or later. |
| @todate | Retrieve orders made on @todate or earlier. |
| @minprice | Retrieve only order details that cost at least @minprice. |
| @maxprice | Retrieve only order details that cost at most @maxprice. |
| @custid | Retrieve only orders from this customer. |
| @custname | Retrieve only orders from customers whose name starts with @custname. |
| @city | Retrieve only orders from customers in this city. |
| @region | Retrieve only orders from customers in this region. |
| @country | Retrieve only orders from customers in this country. |
| @prodid | Retrieve only order details with this product. |
| @prodname | Retrieve only order details with a product starting with @prodname. |
EXEC search_orders should return all orders in the database.This example is fairly simple, since each condition can be implemented with a single condition using =, <=, >= or LIKE. In many real-life situations you have more conditions that affect which tables or columns you need to access. I have purposely left out such conditions from search_orders, in order to keep the focus on the main issues and I only touch more complex conditions in a few places. My hope is that when you have read this article, you will have a better ground to stand on so that you will be able to tackle complex conditions you may run into on your own.
Sometimes you may want to search for multiple values, for instance products 14, 56 and 76. The best way to handle this in SQL 2008 is to use a table-valued parameter. If you use ADO .Net this is very straightforward, as you can pass a DataTable or a List<SqlDataRecord>. You can also use table-valued parameters from a native application that uses the ODBC or OLE DB interfaces. However, if you use old ADO, table parameters are out of reach for you. In this case, you can pass a list of the input values in a string, and then unpack this string to a table using a table-valued function. I have an article Arrays and Lists in SQL Server 2005 on my web site that discusses several such functions.
It is not uncommon that there is a requirement for the user to be able to specify how the output is to be sorted. Since this article is long enough, I will not dwell on this topic more than this brief note. If you are using dynamic SQL, you would of course build the ORDER BY clause dynamically as well. If you are using static SQL, see this section in my general article on dynamic SQL for suggestions. I would also like to point to the possibility to do this client-side. There are grid components that permit the user to resort the data, without any requirement to requery the database.
In this text I discuss some different implementations of search_orders, unimaginatively named search_orders_1 etc. Some of them are included in whole in this text, others only in parts. All are available in the dynsearch-2008 directory on my web site. (The numbering of the procedures is quite out of order with the text, due to the historic evolution of this article.)
The Northgale Database
The Northwind database is very small, so it is very difficult to draw conclusions about the performance from queries in this database. A query that one would expect to use an index, may in fact use a scan, simply because at those small volumes, the scan is cheaper.For this reason I have composed the Northgale database. Northgale has the same tables and indexes as Northwind, but I have exploded the data so that instead of 830 orders, there are 344035 of them. To install Northgale, run Northgale.sql. The data in Northgale is taken from Northwind, so this database must be in place. To install Northgale, you need 4.6 GB of disk space. Once installed, the database takes up 520 MB. (Log space needed when inserting the data is reclaimed at the end of the script.) By default, the database is installed in the same directory as the master database, but you can edit the script to change that.
I’ve exploded the data by cross-joining the tables. For IDs, I’ve composed combinations, so for instance there are now 6640 customers rather than 91 as in the original Northwind. I have also generated new customer and product names by permuting the parts of the names. However, I have not created new cities, countries or regions, so a search on a city alone can yield very many hits.
Keep in mind that Northgale too is a small database by today’s standards. For instance, it easily fits into cache entirely on reasonably equipped server. A poorly written query that requires a scan of, say, the Orders table, still returns within a few seconds. It’s hopefully big enough to give a sense for how good or bad different solutions are, but I would advise you to not draw any far-reaching conclusions. It is also worth pointing out that the way the database was composed, the distribution of data is a bit skewed
When you implement a dynamic search function, you should always benchmark your solution with your production database as well inspect query plans for common search cases.
Static SQL
CAUTION! This section does not apply to all versions of SQL 2008, only to SQL 2008 SP1 CU5 (10.0.2746) and SQL 2008 R2 CU1 (10.50.1702) or later. See the section A Tale of Service Packs and Cumulative Updates for more details.If we were to pursue the same strategy for search_orders as we did for Alternate Key Lookup we would get something like:
IF @orderid IS NOT NULL
SELECT ...
WHERE o.orderd = @orderid
ELSE IF @fromdate IS NOT NULL AND @todate IS NULL AND
@custid IS NULL ...
SELECT ...
WHERE o.OrderDate >= @fromdate
ELSE IF @fromdate IS NOT NULL AND @todate IS NOT NULL AND
@custid IS NULL ...
As you realise, this code would be impossible to maintain (and very boring to write).Thankfully, there is a solution that is straightforward and yet efficient, as illustrated by search_orders_3:
CREATE PROCEDURE search_orders_3
@orderid int = NULL,
@fromdate datetime = NULL,
@todate datetime = NULL,
@minprice money = NULL,
@maxprice money = NULL,
@custid nchar(5) = NULL,
@custname nvarchar(40) = NULL,
@city nvarchar(15) = NULL,
@region nvarchar(15) = NULL,
@country nvarchar(15) = NULL,
@prodid int = NULL,
@prodname nvarchar(40) = NULL AS
SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
c.PostalCode, c.Country, c.Phone, p.ProductID,
p.ProductName, p.UnitsInStock, p.UnitsOnOrder
FROM Orders o
JOIN [Order Details] od ON o.OrderID = od.OrderID
JOIN Customers c ON o.CustomerID = c.CustomerID
JOIN Products p ON p.ProductID = od.ProductID
WHERE (o.OrderID = @orderid OR @orderid IS NULL)
AND (o.OrderDate >= @fromdate OR @fromdate IS NULL)
AND (o.OrderDate <= @todate OR @todate IS NULL)
AND (od.UnitPrice >= @minprice OR @minprice IS NULL)
AND (od.UnitPrice <= @maxprice OR @maxprice IS NULL)
AND (o.CustomerID = @custid OR @custid IS NULL)
AND (c.CompanyName LIKE @custname + '%' OR @custname IS NULL)
AND (c.City = @city OR @city IS NULL)
AND (c.Region = @region OR @region IS NULL)
AND (c.Country = @country OR @country IS NULL)
AND (od.ProductID = @prodid OR @prodid IS NULL)
AND (p.ProductName LIKE @prodname + '%' OR @prodname IS NULL)
ORDER BY o.OrderID
OPTION (RECOMPILE)
The effect of all the @x IS NULL clauses is that if that input parameter is NULL, then that AND-condition is always true. Thus, the only conditions that are in effect are those where the search parameter has a non-NULL value.As far as maintainability goes, it’s difficult to think of a better solution for the search conditions at hand. It’s compact, easy to read and to extend. And performance? Very good as long as you include the query hint OPTION (RECOMPILE). This hint forces the query to be recompiled each time, in which case SQL Server will use the actual variable values as if they were constants. So if the procedure is called like this:
EXEC search_orders_3 @orderid = 11000SQL Server will in essence optimise this WHERE clause:
WHERE (o.OrderID = 11000 OR 11000 IS NULL) AND (o.OrderDate >= NULL OR NULL IS NULL) AND (o.OrderDate <= NULL OR NULL IS NULL) AND (od.UnitPrice >= NULL OR NULL IS NULL) AND (od.UnitPrice <= NULL OR NULL IS NULL) AND (o.CustomerID = NULL OR NULL IS NULL) ...Which leads SQL Server to use the index on OrderID to drive the query. On the other hand, for this call:
EXEC search_orders_3 @custid = 'ALFKI'The effective WHERE clause becomes:
WHERE (o.OrderID = NULL OR NULL IS NULL) AND (o.OrderDate >= NULL OR NULL IS NULL) AND (o.OrderDate <= NULL OR NULL IS NULL) AND (od.UnitPrice >= NULL OR NULL IS NULL) AND (od.UnitPrice <= NULL OR NULL IS NULL) AND (o.CustomerID = N'ALFKI' OR N'ALFKI' IS NULL) ...And the optimiser decides that the index on CustomerID is good.
Some more test cases that you can try and look at the query plan:
EXEC search_orders_3 @prodid = 76 EXEC search_orders_3 @prodid = 76, @custid = 'RATTC' EXEC search_orders_3 @fromdate = '19980205', @todate = '19980209' EXEC search_orders_3 @city = 'Bräcke', @prodid = 76On the other hand, if you fail to include the query hint, you will not get good performance at all. SQL Server will then pick a plan that works somewhat decently for the input when you run the procedure the first time, but it will not be an optimal plan. Since the plan is put into the cache, the plan must be one that produces the correct result with any input, although performancewise, the plan will not at all be suitable for those inputs. If you want to see this for yourself, comment out OPTION (RECOMPILE) and run the procedure again. For instance, first run the procedure with @orderid = 11000 and @prodid = 76, and you will notice that the latter is search is a tad slow. And if you flush the plan (sp_recompile or recreate the procedure), and now first run it with @prodid = 76, and @orderid = 11000 next, you may find that it takes over 20 seconds to retrieve that single order.
The RECOMPILE hint was added already in SQL 2005, but unfortunately it was not implemented appropriately. SQL 2005 merely sniffed the parameters as it does when it builds the plan for a stored procedure. That is, it still built a plan that worked with any input, which was overly conservative, since the plan is only to be used once, with the current values and no others. The net effect is that in SQL 2005, search_orders_3 results in query plans that are less than optimal.
Variation on this Theme
Rather than using OR like above, some people write:o.orderID = coalesce(@orderid, o.OrderID)coalesce() is a function that takes a list of values as argument, and returns the first non-NULL value in the list, or NULL if there is no non-NULL value in the list. Thus, if @orderid is NULL, you get
o.OrderID = o.OrderID, a complete no-op. You can see a full example of this in search_orders_3a.This yields code that is even more compact than using OR, but I strongly recommend that you stay away from this method, because there is a trap: Run this:
EXEC search_orders_3 @orderid = 10654 EXEC search_orders_3a @orderid = 10654The first call return three rows, but the last returns no rows at all! Why? Because for this customer, the column Region is NULL. When @region is NULL, the condition
c.Region = coalesce(@region, c.Region)becomes in essence c.Region = NULL. But in SQL, NULL is not equal to NULL. NULL stands for “unknown value”, and whether two unknown values are the same or different is of course unknown. Whence, no rows are returned.
This can be handled by writing the condition more carefully, for instance
(c.Region = @region OR @region IS NULL) as in the original search_orders_3. But that begs the question why you should use the construct with coalesce at all, when it only works under some conditions.Optional Tables
Sometimes you may have search conditions that require you to access a table only if a certain condition is given. Let’s add one more parameter to our procedure: @suppl_country. If this parameter is provided, the procedure should only return information about products of which the supplier comes from the given country. You could implement this by joining to the Suppliers table, but a suggestion that I originally got from Phillipp Sumi is that you should use an EXISTS clause in this way:@suppl_country IS NULL OR EXISTS (SELECT *
FROM Suppliers s
WHERE s.SupplierID = p.SupplierID
AND s.Country = @suppl_country)
To illustrate this, I wrote search_orders_9, which for the sake of simplicity is an extension of search_orders_3. If you run this:EXEC search_orders_9 @country = 'Sweden' EXEC search_orders_9 @suppl_country = 'Sweden'and then look at the query plans, you will see that the first plan does not include Suppliers at all. If you change the procedure and comment out OPTION (RECOMPILE), the query plan will of course include Suppliers, but there is still an interesting observation to make. First issue this command: SET STATISTICS IO ON, and then run the procedures again. You will get this output for the first execution (I have abbreviated it here for the sake of space).
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0 Table 'Products'. Scan count 3, logical reads 238, physical reads 0, read-ahead reads 0 Table 'Customers'. Scan count 3, logical reads 655, physical reads 0, read-ahead reads 0 Table 'Orders'. Scan count 80, logical reads 25035, physical reads 46, read-ahead reads 91 Table 'Order Details'. Scan count 3, logical reads 8957, physical reads 36, read-ahead reads 8896 Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0Suppliers is nowhere in sight, whereas the output for the second execution starts off:
Table 'Suppliers'. Scan count 0, logical reads 11858, physical reads 0, read-ahead reads 0Thus, even without the RECOMPILE hint, SQL Server is able to avoid the access to Suppliers. If you have a background in a language like C++, you may think that this is just another case of operator shortcutting, but this does not exist in SQL. SQL Server will evaluate operands in the order which is the most efficient, and you get the same effect if you write the two OR conditions in reverse order. Instead, the secret is found in the query plan where you will find a Filter operator with a Startup Expression, the same thing we encountered in the section Using OR for alternate key lookup.
What if you need to access data from different tables, depending on input parameters? Say there is a parameter @historic, if this parameter is 1, you should read the HistoricOrders and HistoricOrderDetails instead. (No, there are no such tables in Northwind, I’m just making it up for the sake of the example.) It would be possible to cram all that into one query, but I’d rather not write that query, and you’d rather not read it, believe me. As long as we do static SQL in a stored procedure, this is best handled with an IF statement and two distinct queries. But we are now straying into the area where dynamic SQL gives us a solution that is easier to write and maintain.
Performance: The Fine Print
Now with OPTION (RECOMPILE) working the way it should, does that mean we have the ultimate solution? Do we even need to bother about dynamic SQL for this kind of problem? Well, first of all, as I just pointed out, if the level of complexity increases so that you need to access different tables, or use different operators, you will sooner or later reach a point where a solution in dynamic SQL is easier to implement and maintain.But even from the point of view of performance, there are still some things to consider. After all, we are compiling the query each and every time a user makes a search. Say that there are users running search_orders once a minute in peak activity. In this case, the extra time we spend on compilation is no big deal. But assume instead, that there are over 100 calls to the procedure every second. And assume furthermore, that in the major share of these calls the only input parameter is @orderid. If you have the server to compile this query over and over to produce the same query plan, you may bring the server to its knees.
A simple way out is to add an IF statement, so that you have:
IF @orderid IS NOT NULL
SELECT ...
WHERE O.OrderID = @orderid
AND -- Conditions on Order Details here.
-- No OPTION (RECOMPILE) here!
ELSE
SELECT ...
WHERE -- same conditions as before
OPTION (RECOMPILE)
By adding a separate branch for @orderid, you can make sure
that you use a cached plan in this common case. You may wonder how big
is the overhead of compiling a query? There is no simple answer to this,
but in general, the simpler the query is, the bigger the overhead. A
complex query may take several seconds to compile, but if the query then
runs for several minutes, the compilation time is ignorable. On the
other hand, if the query executes in less than 10 ms, but
compilation takes 20 ms, the overhead is considerable.Next you may identify that there are also many requests with @custid and @fromdate, with @fromdate typically being at most a week ago. You can still bump in another IF statement, and the procedure search_orders_4 illustrates where we are heading.
But this may also be as far as this path can take you. If you find two-three more common input combinations, this gets out of hand. And that’s where you need to look into dynamic SQL. “But would that not add even more compilation?”. No, here is the paradox: this solution with static SQL is more dynamic than dynamic SQL in the sense that we have a new query plan each time. With dynamic SQL, you can implement a solution where each combination of input parameters has its plan cached, and that is what we will look into next.
A Tale of Service Packs and Cumulative Updates
The new behaviour of OPTION(RECOMPILE) does not appear in all versions of SQL 2008. It first appeared in a late beta of SQL 2008, and it was included in the RTM version of SQL 2008. However, there was a serious bug: If two processes ran the same stored procedure in which there was a statement that had OPTION (RECOMPILE) attached to it, one process could get the results of the other process. When Microsoft investigated the bug, they realised that a correct fix would be difficult and with risk for regression bugs. Given that Service Pack 1 was around the corner, they deicded to take the easy way out and restore the old behaviour from SQL 2005, and this is how SQL 2008 SP1 shipped. This also applies to the RTM version of SQL 2008 R2.Eventually, though, Microsoft realised that there were customers who had built solutions around the new behaviour, and for which the quick fix of reverting to the old behaviour was not acceptable. Therefore, Microsoft issued a hotfix which fixed the actual bug and restored the RTM behaviour, and this hotfix became available to the public with the release of Cumulative Update 5 for SP1. This fix was also included in SQL 2008 SP2 which was released in the end of September 2010.
Despite the fact that CU5 was released about six months before SQL 2008 R2, the fix was not included in the RTM version of SQL 2008 R2, but shortly after shipping R2, Microsoft released Cumultative Update 1 for SQL 2008 R2 which includes the hotfix. The fix is also included in Service Pack 1 for SQL 2008 R2, which was released in July 2011.
If you are not acquainted with the concept of Cumulative Updates, Microsoft releases these as a rollup of all hotfixes since the most recent general release, that is, RTM or a Service Pack. CUs are not as rigorously tested as Service Packs, and all announcements of a CU include this blurb:
A supported cumulative update package is now available from Microsoft. However, it is intended to correct only the problems that are described in this article. Apply it only to systems that are experiencing these specific problems. This cumulative update package may receive additional testing. Therefore, if you are not severely affected by any of these problems, we recommend that you wait for the next SQL Server 2008 service pack that contains the hotfixes in this cumulative update package.
To get access to a CU, you need to register and leave your email address, so if Microsoft becomes aware of a serious regression bug, they can contact you.
It follows from the above, that you should not install a Cumulative Update on your production system to get access to a new feature, unless you are in real need of it. Service packs undergo more thorough testing than CUs, so if you are on SQL 2008 SP1, you should rather upgrade to Service Pack 2 or later, than installing CU5. Likewise, if you are on SQL 2008 R2 RTM, I recommend that you install Service Pack1, rather than CU1. KB article 975977 describes how to get access to CU5 and KB article 981355 covers CU1 for R2.
KB Article 976603 describes the fix as such, and also gives some information for the underlying cause of the bug. KB Article 986693 describes the original fix, that is the revert to the old behaviour. You can also find details about the bug and a repro in the original Connect entry.
To determine whether your server has a correct implementation of OPTION (RECOMPILE), issue this command:
SELECT serverproperty('ProductVersion')
It should return 10.0.2746.0 (SQL 2008) or 10.50.1702.0 (SQL 2008 R2) or higher.Dynamic SQL
Introduction
Performance
As you have gathered, for a dynamic search like our search_orders, there is no single plan that fits all combinations of input parameters. One plan which may give sub-second response when the user specifies the product, may take two minutes when the user specifies the customer or vice versa. In this case, some extra ms to build a query plan for each new combination of search parameters is a price well worth paying. With static SQL we achieved this by using the RECOMPILE hint to force a compilation every time. With dynamic SQL, we can take this one step further: we can get a plan for each different combination of input parameters that SQL Server stores in the cache, so that we don’t have to compile every time a user asks for a certain order id.In previous releases of SQL Server, dynamic SQL was the best choice by far in terms of performance for dynamic search problems. In SQL 2008, dynamic SQL is still in the lead, but with the RECOMPILE hint working properly, the margin is slim and in many – but not all! – situations negligible.
Security
Way back in SQL 2000, there was a restriction that in some shops was a tremendous show-stopper for dynamic SQL: for dynamic SQL to work, users needed to be granted SELECT permission on the tables. You may recall that normally when you grant users the right to run a stored procedure, the users acquire the right to access the tables through a mechanism known as ownership chaining, but this does not apply when you use dynamic SQL.SQL 2005 added two more mechanisms to grant permissions through stored procedures, making permissions less of an issue for dynamic SQL than it used to be. You can sign the procedure with a certificate and then associate the certificate with a virtual user that you grant the necessary permissions. You can also use the EXECUTE AS clause to impersonate such a virtual user. The latter solution is less hassle to implement, but it has side effects that can break row-level security schemes and make system monitoring more difficult. I describe both these methods in detail in my article Granting Permissions through Stored Procedures.
Admittedly, these two solutions are a little more complex than ownership chaining, and I hear from people who still prefer to avoid dynamic SQL because of the permissions issue. There is one more possibility: you can use a hybrid solution where you combine static and dynamic SQL using a view or an inline-table function. I don’t discuss such solutions in this article, but the SQL 2005 version of this article, includes the section Hybrid Solutions – Using Both Static and Dynamic SQL, if you want to explore that option.
Further Reading
There are more things to say about dynamic SQL as such, that I will not go into here. Rather I refer you to my article The Curse and Blessings of Dynamic SQL, where I discuss the use of dynamic SQL in general. In this article, I also cover the performance and security topics in more detail.Testing is Necessary!
One potential problem with dynamic SQL is that you could slip somewhere, with the result that when the users specify a specific search criteria, their reward is a syntax error. Therefore, it is extremely important that you test all input parameters, and preferably some combinations of them too.The Methods to Do Dynamic SQL
There are several ways to go:- A T-SQL procedure that builds a parameterised query string and executes it with sp_executesql.
- A CLR procedure that builds a parameterised query string and executes it.
- Client-side code that builds a parameterised query string and executes it.
- A T-SQL procedure that builds a query string and executes it with EXEC().
- A CLR procedure that builds a query string by inlining all parameters.
- Client-side code that builds a query string by inlining all parameters.
So the first three alternatives are perfectly acceptable, and which you choose is to a large extent a matter of convenience. However, if you must work from the presumption that users do not have SELECT permission on the underlying tables, you need to use a stored procedure in T-SQL to arrange for the permissions.
I will in detail discuss an implementation of search_orders that uses sp_executesql, and I will also present two CLR implementations. I’m not presenting any client-side solutions, but the CLR solutions are equally applicable to .Net clients. I also describe an implementation of search_orders that uses EXEC(), as there may be a few special situations where it makes sense after all to inline certain parameters. This section also serves to show that inlining values is more difficult than using parameterised commands. I close the section on dynamic SQL by looking at situations where cached query plans hampers your performance.
Doing it Client-Side
As long as you use parameterised statements, it is perfectly acceptable to implement dynamic searches client-side in my opinion. I think you should stick to the same principles throughout your application, though. That is, if the application normally uses stored procedures, your dynamic search should also be in a stored procedure, be that T-SQL or CLR.In any case, there is one thing you should absolutely never do: you should not build the WHERE clause in client code and pass that as a parameter to your stored procedure. Modules in a system should be as loosely coupled as possible. Having the query in the procedure and WHERE clause built outside it is the total antithesis of this. For instance, if someone changes a table alias, what comes in the WHERE clause may no longer be valid. Sending a WHERE clause as a parameter, also means that your procedure will be wide-open to SQL injection. Thus, if you decide that you prefer to build the WHERE clause in client-code, this is where you should have your query as well. (If there are permissions issues you need to deal with, you could possibly handle this by creating a view. But never have WHERE clauses as parameters. Never!)
Using sp_executesql
sp_executesql is a system procedure that takes an SQL statement as its first parameter, and a declaration of parameters as the second parameter, and the remaining parameters are determined by that parameter list. Here is the procedure search_orders_1, which uses sp_executesql:CREATE PROCEDURE search_orders_1 -- 1
@orderid int = NULL, -- 2
@fromdate datetime = NULL, -- 3
@todate datetime = NULL, -- 4
@minprice money = NULL, -- 5
@maxprice money = NULL, -- 6
@custid nchar(5) = NULL, -- 7
@custname nvarchar(40) = NULL, -- 8
@city nvarchar(15) = NULL, -- 9
@region nvarchar(15) = NULL, -- 10
@country nvarchar(15) = NULL, -- 11
@prodid int = NULL, -- 12
@prodname nvarchar(40) = NULL, -- 13
@debug bit = 0 AS -- 14
-- 15
DECLARE @sql nvarchar(MAX), -- 16
@paramlist nvarchar(4000) -- 17
-- 18
SELECT @sql = -- 19
'SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity, -- 20
c.CustomerID, c.CompanyName, c.Address, c.City, -- 21
c.Region, c.PostalCode, c.Country, c.Phone, -- 22
p.ProductID, p.ProductName, p.UnitsInStock, -- 23
p.UnitsOnOrder -- 24
FROM dbo.Orders o -- 25
JOIN dbo.[Order Details] od ON o.OrderID = od.OrderID -- 26
JOIN dbo.Customers c ON o.CustomerID = c.CustomerID -- 27
JOIN dbo.Products p ON p.ProductID = od.ProductID -- 28
WHERE 1 = 1' -- 29
-- 30
IF @orderid IS NOT NULL -- 31
SELECT @sql = @sql + ' AND o.OrderID = @xorderid' + -- 32
' AND od.OrderID = @xorderid' -- 33
-- 34
IF @fromdate IS NOT NULL -- 35
SELECT @sql = @sql + ' AND o.OrderDate >= @xfromdate' -- 36
-- 37
IF @todate IS NOT NULL -- 38
SELECT @sql = @sql + ' AND o.OrderDate <= @xtodate' -- 39
-- 40
IF @minprice IS NOT NULL -- 41
SELECT @sql = @sql + ' AND od.UnitPrice >= @xminprice' -- 42
-- 43
IF @maxprice IS NOT NULL -- 44
SELECT @sql = @sql + ' AND od.UnitPrice <= @xmaxprice' -- 45
-- 46
IF @custid IS NOT NULL -- 47
SELECT @sql = @sql + ' AND o.CustomerID = @xcustid' + -- 48
' AND c.CustomerID = @xcustid' -- 49
-- 50
IF @custname IS NOT NULL -- 51
SELECT @sql = @sql + ' AND c.CompanyName LIKE @xcustname + ''%''' -- 52
-- 53
IF @city IS NOT NULL -- 54
SELECT @sql = @sql + ' AND c.City = @xcity' -- 55
-- 56
IF @region IS NOT NULL -- 57
SELECT @sql = @sql + ' AND c.Region = @xregion' -- 58
-- 59
IF @country IS NOT NULL -- 60
SELECT @sql = @sql + ' AND c.Country = @xcountry' -- 61
-- 62
IF @prodid IS NOT NULL -- 63
SELECT @sql = @sql + ' AND od.ProductID = @xprodid' + -- 64
' AND p.ProductID = @xprodid' -- 65
-- 66
IF @prodname IS NOT NULL -- 67
SELECT @sql = @sql + ' AND p.ProductName LIKE @xprodname + ''%''' -- 68
-- 69
SELECT @sql = @sql + ' ORDER BY o.OrderID' -- 70
-- 71
IF @debug = 1 -- 72
PRINT @sql -- 73
-- 74
SELECT @paramlist = '@xorderid int, -- 75
@xfromdate datetime, -- 76
@xtodate datetime, -- 77
@xminprice money, -- 78
@xmaxprice money, -- 79
@xcustid nchar(5), -- 80
@xcustname nvarchar(40), -- 81
@xcity nvarchar(15), -- 82
@xregion nvarchar(15), -- 83
@xcountry nvarchar(15), -- 84
@xprodid int, -- 85
@xprodname nvarchar(40)' -- 86
-- 87
EXEC sp_executesql @sql, @paramlist, -- 88
@orderid, @fromdate, @todate, @minprice, -- 89
@maxprice, @custid, @custname, @city, @region, -- 90
@country, @prodid, @prodname -- 91
In case you are reading this with a narrow browser window, I should
point out that there are line numbers to the right that I will refer to
in the following text.Overall Flow
On lines 19-29, I compose the basic SQL string. The conditionWHERE 1 = 1 on line 29 is there to permit the users to call the procedure without specifying any parameters at all.Then on lines 31-68, I check all parameters (save @debug), and if a parameter is non-NULL, I add a condition for the corresponding column to the SQL string. Finally on line 70, I add the ORDER BY clause.
On line 72, I test the @debug parameter. If 1, I print the SQL string. This is handy, if the dynamic SQL yields a compilation error that I don’t understand. Once I see the SQL code the error might be apparent. A typical error is to miss a space, leading to code that reads:
WHERE 1 = 1 AND o.OrderDate <= <span style="color: #ff0000;">@xtodateAND</span> p.ProductName LIKE @xprodnameOn lines 75-86 I declare the parameter list for my dynamic SQL statement, and on lines 88-91 I finally execute it.
A Little More in Detail on sp_executesql
sp_executesql is a system procedure with a very special parameter list. The first parameter is a parameterised SQL statement. The second parameter is a parameter-list declaration, very similar to the parameter list to a stored procedure. And the remaining parameters are simply the parameters defined by that parameter-list parameter.The parameterised query is placed in cache, so if a second user makes a search using the same input parameters (for instance @city and @prodname), the query plan will be reused, even if he uses different values in his search.
The parameter-list parameter that I pass to sp_executesql is basically the same as the parameter list to the procedure itself. Here, I have called the parameters @xorderid and so on. There is not any technical reason for this, and normally I would have re-used the names in the parameter list of the stored procedure. But I wanted to make you aware of that the parameters inside the dynamic SQL have no relation to the parameters and variables in the surrounding stored procedure. The dynamic SQL constitutes a scope on its own. Think of it as a stored procedure that you create on the fly.
The SQL statement and the parameter must be Unicode strings, so the @sql and @paramlist variables must be nvarchar. And if you pass the SQL statement or the parameter-list as literals, you must use N (for National) to denote a Unicode literal:
EXEC sp_executesql N'SELECT * FROM Orders WHERE OrderID = @orderid',
N'@orderid int', @orderid = 10872
You can use OUTPUT parameters too with sp_executesql. I’m not showing any example of this here, but see the section on sp_executesql in Curse and Blessings…You can see that on line 16, I have declared @sql as nvarchar(MAX). By using a MAX string, I ensure that I don’t outgrow the string and truncate the SQL command.
The dbo Prefix
On lines 25-28 there is something very important:FROM <b>dbo.</b>Orders o JOIN <b>dbo.</b>[Order Details] od ON o.OrderID = od.OrderID JOIN <b>dbo.</b>Customers c ON o.CustomerID = c.CustomerID JOIN <b>dbo.</b>Products p ON p.ProductID = od.ProductIDAs you can see, I refer to all tables in two-part notation. That is, I also specify the schema. Way back in SQL 2000, each user had his own default schema, and if an object was not found in that schema, SQL Server would next look in the dbo schema. In SQL 2008, it is perfectly possible that all users have dbo as their default schema, but there is little reason to rely on that. If two users have different default schema, and the query includes a table specified without schema, the users cannot share the same cache entry as potentially the table name could refer to different tables in different schemas.
Double Feature
You may wonder why there are two conditions on the order id, customer id and product id (lines 32-33, 48-49 and 64-65). This is because I once learnt that you should always specify all conditions you know of to help the optimizer out. That was very long ago, and the rule may no longer apply. But I have included it here nevertheless.Picking Different Tables
Assume for the example, that there had been a table Historicorders in Northwind for orders that are shipped, invoiced and paid for, and users should be given the choice to search either current or historic orders. (For simplicity, I’m overlooking Order Details here.) How would you solve this?You may be tempted to try something like
SELECT ... FROM @ordertable and then add @ordertable as a parameter. However, T-SQL does not permit you to parameterise the table name. (And for good reasons as I discuss in Curse and Blessings…) In fact, you should not pass the table name as a parameter to the procedure at all; once you have started to use stored procedures, all references to table and column names should be in the procedures themselves. The correct way would be add another parameter to search_orders:
@historic_data bit DEFAULT = 0and then line 25 would read:
FROM dbo.' + CASE @historic_data
WHEN 0 THEN 'Orders'
WHEN 1 THEN 'Historicorders'
END + ' o
Trying it out
Here are some test cases:EXEC search_orders_1 @orderid = 11000 EXEC search_orders_1 @custid = 'ALFKI' EXEC search_orders_1 @prodid = 76 EXEC search_orders_1 @prodid = 76, @custid = 'RATTC' EXEC search_orders_1 @fromdate = '19980205', @todate = '19980209' EXEC search_orders_1 @city = 'Bräcke', @prodid = 76If you try these and inspect the query plans, you will see that in the available indexes on the search columns is used in Northgale with one exception: the index on Customers.City is not used for the last case, but this column is not very selective. If you run the queries in Northwind, you will see more scans, but this is due to the small size of that database. If you compare with the plans for search_orders_3, you will see that these identical.
Using the CLR
SQL 2005 added the possibility to write stored procedures in languages that use the CLR (Common Language Runtime), such as Visual Basic .Net or C#. A dynamic search can be implemented in the CLR just as well as in T-SQL. After all, search_orders_1 is all about string manipulation until it invokes sp_executesql.If you are more comfortable working in VB or C#, you may prefer to implement your searches in the CLR rather than T-SQL. The reverse also applies: if you feel more comfortable with T-SQL, there is little reason to use the CLR to implement dynamic searches.
I’ve written two CLR procedures, search_orders_vb and search_orders_cs, that I will discuss in this section. As the code is fairly repetitive, I’m not including any of them in full here, but I only highlight some important points. Beware that I will not go into any details on writing CLR stored procedures as such. If you have never worked with the CLR before, but are curious, I refer you to Books Online. At the end of this section there are instructions on how to create these two procedures in SQL Server.
Setting up the Statement
This is how search_orders_cs starts off:string Query;
SqlCommand Command = new SqlCommand();
Query = @"SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
c.CustomerID, c.CompanyName, c.Address, c.City,
c.Region, c.PostalCode, c.Country, c.Phone,
p.ProductID, p.ProductName, p.UnitsInStock,
p.UnitsOnOrder
FROM dbo.Orders o
JOIN dbo.[Order Details] od ON o.OrderID = od.OrderID
JOIN dbo.Customers c ON o.CustomerID = c.CustomerID
JOIN dbo.Products p ON p.ProductID = od.ProductID
WHERE 1 = 1 ";
As you can see this is very similar to search_orders_1, including the dbo prefix. The rule that you should use two-part notation to maximise query-plan reuse applies to CLR procedures as well.Defining the Parameters
This is the very important part. Far too often on the newsgroups, I see posters who inline the parameter values into their query strings. As I have discussed, this is bad for several reasons. You get very little chance for query-plan reuse and you open the gates for SQL injection, which I discuss in Curse and Blessings…What you should do is to build parameterised statements. Here is how the @custid parameter is added in search_orders_cs:
if (! Custid.IsNull) {
Query += " AND o.CustomerID = @custid" +
" AND c.CustomerID = @custid";
Command.Parameters.Add("@custid", SqlDbType.NChar, 5);
Command.Parameters["@custid"].Value = Custid;
Command.Parameters["@custid"].Direction = ParameterDirection.Input;
}
As in the T-SQL example, the query string is extended with the conditions for the parameter in both Orders and Customers.What is different from T-SQL is how we define the parameter list and supply the value. In T-SQL the parameter list is a string, which includes all possible parameters. When working with the CLR, we only define the parameters that actually are in use. We define a parameter by adding it to the Parameters collection of the Command object. There are a number of ways to do this, and I refer you MSDN Library for a complete reference. The example shows a pattern that works for the most commonly used data types. The first parameter is the variable name, while the second parameter is the type indicator from the SqlDbType enumeration. The last parameter is the length, which you need to specify for the char, varchar, nchar, nvarchar, binary and varbinary data types, but you would leave it out for fixed-length types. Note that for decimal/numeric parameters, you need to use some different way to add them, as this flavour does not have means to specify scale and precision.
Once the parameter is defined, I assign the value separately. I also explicitly specify the direction for clarity.
Here is the above in Visual Basic .Net:
If Not Custid.IsNull Then
Query &= " AND o.CustomerID = @custid" & _
" AND c.CustomerID = @custid" & VbCrLf
Command.Parameters.Add("@custid", SqlDbType.NChar, 5)
Command.Parameters("@custid").Value = Custid
Command.Parameters("@custid").Direction = ParameterDirection.Input
End If
It’s very similar to the C# example. Different operator for string
concatenation, parentheses to address elements in the collection and no
semicolons.Don’t Forget to Specify the Length!
There is one thing about the parameter definition, I like to highlight:Command.Parameters.Add("@custname", SqlDbType.NVarChar, <b>40</b>);
I explicitly specify the length of the string parameter. ADO .Net permits you leave out the length when you add the parameter. ADO .Net also supplies the method AddWithValue
that permits you define a parameter and provide the value in a single
call. Do not fall into the trap of using these shortcuts! The reason
these alternatives are bad is that when ADO .Net constructs the call to sp_executesql for you, it will use the length of the actual parameter value when it builds the parameter list. Thus, if one user enters Alfred, the parameter will be declared as:@custname nvarchar(6)But if another user enters Berglund, the parameter will be declared as
@custname nvarchar(8)When SQL Server looks up a query in the cache, it hashes the query text and the parameter list and performs a lookup on that hash value. That is, differences in the parameter list will result in different cache entries. The net effect is that you will have less benefit from the cache, and the risk that you contribute to cache bloat increases. Under extreme circumstances, the sheer numbers of plans in the cache can lead to degraded performance.
If you feel that you don’t want to hardcode the length of the column in case it could change in the future, rather than leaving out the length, use the maximum length for the type, that is 8000 for char, varchar, binary and varbinary and 4000 for nchar and nvarchar.
Running the Query
This is how this looks like in C#:using (SqlConnection Connection =
new SqlConnection("context connection=true"))
{
Connection.Open();
if (Debug) {
SqlContext.Pipe.Send(Query);
}
Command.CommandType = CommandType.Text;
Command.CommandText = Query;
Command.Connection = Connection;
SqlContext.Pipe.ExecuteAndSend(Command);
}
Which very much is the standard way to run a query from a CLR procedure. SqlContext.Pipe.Send is how you say PRINT in the CLR.Loading the Examples
If you have any flavour of Visual Studio 2005 or later (including the Express editions), you can deploy search_orders_cs and search_orders_vb from Visual Studio. (But please don’t ask me how to do it, Visual Studio just leaves me in a maze.)Since the .Net Framework comes with SQL Server and includes compilers for the most common .Net languages, you can also load them without Visual Studio. First make sure that C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727 (or corresponding) is in your path. Then run from a command-line window:
csc /target:library search_orders_cs.cs vbc /target:library search_orders_vb.vbTo load the DLLs into SQL Server, you can use load_clr_sp.sql. You will have to change path in the CREATE ASSEMBLY command to where you placed the DLLs. Note that the paths are as seen from SQL Server, so if you don’t have SQL Server on your local machine, you will have to copy the DLLs to the SQL Server box, or specify a UNC path to your machine.
Trying it out
The same test cases as for search_orders_1:EXEC search_orders_cs @orderid = 11000 EXEC search_orders_cs @custid = 'ALFKI' EXEC search_orders_cs @prodid = 76 EXEC search_orders_cs @prodid = 76, @custid = 'RATTC' EXEC search_orders_cs @fromdate = '19980205', @todate = '19980209' EXEC search_orders_cs @city = 'Bräcke', @prodid = 76The query plans should be identical to search_orders_1, as it is the same queries. Hint: if you want to look at the query plans, you need to use Profiler, and get the Showplan XML event. For some reason, you don’t see query plans for queries submitted from CLR code in Management Studio or Query Analyzer.
Using EXEC()
As I’ve already mentioned, EXEC() is an inferior solution to sp_executesql and the CLR. Since EXEC() does not take parameters, you have to build a query string with parameter values inlined. This precludes query plans from being reused. It also makes the programming more difficult.Nevertheless, occasionally there could be reason to inline a certain parameter for performance reasons as I will discuss in the next section. The procedure search_orders_2 inlines all parameter values and runs the search query with EXEC(). The idea is not that you should ever implement your dynamic search this way. But if you need to inline an occasional parameter, there is good reason to look at search_orders_2, so that you avoid the pitfalls with inlining parameter values.
I don’t include the procedure in full here, but only comment on certain parts.
General Notes
When building a non-parameterised query with the values included, you need to be disciplined when you write your code. It’s very easy to get lost in a maze of nested quotes. I often see people on the newsgroups posting code like:EXEC('SELECT col1, col2, ...
FROM ...
WHERE ' + CASE @par1 IS NULL THEN ' + col = ''' + @par + ''' + ...)
This is difficult to read and maintain, and if it goes wrong, you have no idea what SQL you are actually generating. search_orders2 aggregates the SQL code into a variable, and there is a @debug parameter so I can see the SQL code, if I need to verify that I am generating the right thing. And by using the function quotename(), I have tried to reduce the need for nested quotes to a minimum. (More about this function just below.)Keep in mind that EXEC() opens a new scope. The SQL you generate can not refer to variables declared in the surrounding procedure, and if the dynamic SQL creates a temp table, that table goes away with the scope.
The Use of Quotename()
Quotename appears commonly in this procedure as in this example: SELECT @sql = @sql + ' AND c.City = ' +
quotename(@city, '''')
I originally got this idea from SQL Server MVP Steve Kass. quotename() takes two parameters: a string and a delimiter character. If the delimiter is (, [, < or {,
the corresponding bracket is assumed to be the closing delimiter. The
return value is the string enclosed by the delimiters, and any closing
delimiter in the string is doubled. The default delimiter is square
brackets. Examples:SELECT quotename('Order Details') -- => [Order Details]
SELECT quotename('Two o' + char(39) + 'clock', '''') -- => 'Two o''clock'
SELECT quotename('abc{}def', '{') -- => {abc{}}def}
Of these examples, the interesting one in this context is the second. In T-SQL
you use single quotes to delimit strings. If you need to include the
string delimiter in a literal, you double it, so those four single
quotes in a row is a one-character literal with the value of a single
quote ('). Alternatively, you can express this value as char(39) like I do when I form the string Two o'clock in the example above.Now, why all this? There are three important reasons:
- It makes the code much easier to write. If you don't use quotename(), you need to use nested quotes, and the code becomes really messy.
- This permits the user to enter values such as Let's Stop N Shop for the company name (this is an actual customer in the Northwind database) without getting a syntax error.
- This increases your protection against SQL injection.
Datetime Parameters
All non-string parameters must be converted to string literals and for the datetime parameters I use the expression:convert(char(23), @fromdate, 126)I did not pick format code 126 at whim. The resulting string is like this: 2003-04-06T21:14:26.627, which is the precise format that is mandated by the standard ISO 8601, and which is commonly used in XML. More importantly, it is one of the three formats for datetime literals in SQL Server of which the interpretation does not depend on the settings for date format and language. If you don't understand what I'm talking about, try these statements:
SET DATEFORMAT mdy SELECT convert(datetime, '02/07/09') SET DATEFORMAT dmy SELECT convert(datetime, '02/07/09') go SELECT convert(datetime, '2002-12-19') -- Fails! go SET LANGUAGE Swedish SELECT convert(datetime, 'Oct 12 2003') -- Fails! (It's "Okt" in Swedish.)Beside the SET commands, the language setting (which also controls the date format) can be set on user-level with the system procedure sp_defaultlanguage. When you work with datetime strings in T-SQL you always need to be careful that you use a safe format to avoid nasty surprises. (Even better is of course to stick to parameters and avoid strings altogether.)
For further information about the various style parameters you can give to convert(), see the topic CAST and CONVERT in the T-SQL Reference of Books Online.
Numeric Parameters
I handle the money and int parameters in the procedure differently:@minpricestr = convert(varchar(25), @minprice)and
SELECT @sql = @sql + ' AND od.ProductID = ' + str(@prodid)To be honest, there is not really any good reason for this. I could have used convert() for @orderid and @prodid too. It is just that I find str() a little more convenient to use for integer values. str() works with money too, but with no further parameters, the value is rounded to the nearest whole number, whereas convert() by default retains two decimals, which is why I preferred convert() for the @min/maxprice parameters.
I'm not discussing the float and decimal data types here. Refer to the topics CAST and CONVERT and STR in Books Online.
varchar and nvarchar
The Northwind database consistently uses the nvarchar data type, but I've failed to handle this in my procedure. The data in Northwind appears to be restricted to the characters in Windows Latin-1, which covers languages such as English, Spanish, French and German, so if you have a system collation based on this character set, you would never get any incorrect response from search_orders_2. However, a Russian user trying:EXEC search_orders_2 @city = N'Bräcke'would not get the orders from Folk och Fä HB in Bräcke, because Bräcke would be converted to varchar as Bracke. This is how my procedure should have read: (lines 71-72)
SELECT @sql2 = @sql2 + ' AND c.City = <b>N</b>' +
quotename(@city, '''')
That is, I should have added an N to make the resulting SQL read c.City = N'Bräcke'. That N makes the literal a Unicode literal of the nvarchar data type, and it should appear with all nvarchar parameters. (The N stands for National, and is taken from the ANSI standard SQL-92.)So why did I not give you the proper code? Well, I figured that many readers would take my code as a template for their own code, and I don't want you to include that N by routine. Because, if you use an nvarchar literal together with an indexed varchar column, the varchar column will be converted to nvarchar, and that means that the index will be used less efficiently or not at all. If you use a Windows collation, SQL Server will still seek the index, but in a more roundabout way, resulting in doubled or tripled execution times. If the indexed column has an SQL collation, the index is completely useless, and you will get a less efficient plan and the execution time can hundred- or thousand-fold.
Summary
Just to make the point once more, here is a summary of the advantages of using sp_executesql over EXEC():- Minimises the need to use nested quotes.
- Minimises the risk for SQL injection.
- No need to worry about the formatting of datetime and numeric parameters.
- No need to think about whether use N before string literals or not. (But of course in the parameter list to sp_executesql you need to make the correct choice of data type.)
- Query plans for the same set of input parameters can be reused from cache.
When Caching Is Not Really What You Want
Thus far I have preached the virtue of caching, but there are situations where plan caching works against you. If two users search with same parameters, but different values, they get the same plan with search procedures like search_orders_1 or search_orders_cs that use parameterised queries. But consider these two invocations:EXEC search_orders_1 @custid = 'ERNTC',
@fromdate = '19980205', @todate = '19980205'
EXEC search_orders_1 @custid = 'BOLSR',
@fromdate = '19980101', @todate = '19981231'
In Northgale, ERNTC is the top customer with 572 orders, whereas there is one single order for BOLSR.
As you may guess, the best query plans for these two invocations are
not the same. When searching for orders from a frequent customer in a
short time span, it's probably better to use the index on OrderDate, but when searching for orders from a infrequent customer in a broad time span, the index on CustomerID is likely to be the winner. How to deal with this? I will discuss a couple of tricks that you can employ.Inlining Some Values
While inlining all parameter values into the string is a bad idea, it can sometimes make sense to include some parameter values directly. Say that the search had included a @status parameter, and there had only been four possible values for Orders.Status, whereof Completed had accounted for 95 % of the values. A typical search may be for new orders, less than 1 % of the rows. In this case, it would be a very good idea to include the value of the @status parameter directly into the query string, since different status values could call for different query plans. This applies not the least if there is an index on the Status column, and even more if the index on Status is filtered with the conditionWHERE Status <> 'Completed'. (Filtered indexes is a new feature in SQL 2008 that permits you to add a simple WHERE clause to an index, in which case the index is built only on the rows that satisfy the WHERE clause.)What about the situation above, the combination of customer and date interval? Inlining the customer id into the string is probably a bad idea. That's one query plan per customer, so you would reduce the reuse of the cache drastically. The date interval is a better candidate, assuming that users would tend to use the same intervals during a day, for instance for the last seven days. You would still add quite a few more query plans to the cache, but old plans for last week would age out by time, so unless the usage pattern is extremely diverse, this could work out.
Changing the Query Text
As I discussed previously, SQL Server looks up queries in the cache by hashing the query text. This means that two queries with different text are different entries in the cache, even if they are logically equivalent. There are many ways to alter the query text, for instance you could do something like:IF @fromdate IS NOT NULL AND @todate IS NOT NULL
BEGIN
SELECT @sql = @sql + CASE WHEN @fromdate = @todate
THEN ''
WHEN datediff(DAY, @fromdate, @todate) <= 7
THEN ' AND 2 = 2 '
WHEN datediff(DAY, @fromdate, @todate) <= 30
THEN ' AND 3 = 3 '
...
The advantage with changing the query text depending on the parameter
values over including them in the query string is that you get fewer
plans in the cache, but it also means that in some cases you will not
run with the best plan. To wit, you can only make some guesses where the
breakpoint between two plans is. With some luck the damage from this is
limited. Even if the plan for a single day is not the best for the span
of a week, it may still be decently useful. But you will have to know
your data, and possibly tune as time goes.Could this technique also be used to deal with the fact that different customers can have a very different number of orders? Probably not. Counting the number of orders for a customer before we construct the query is taking it too far in my opinion, and it could be more expensive than what you save in the other end.
Above, I altered the query text by adding extra conditions, but it works equally well with adding extra spacing or extra comments, since SQL Server looks up the query before it parses it. And since the parameter list is part of the query, you can play with that one too if you want to show off and make your code really obscure.
Index Hints
I added this here, because I figured that sooner or later someone would ask Could you not use an index hint? Surely, index hints can be useful in some situations. For instance, if the user provides an order id, we may want the optimizer to always use PK_Orders. But force an index because of a certain input value? I find it difficult to see that there are very many situations where there this would be a fruitful strategy. As always, only add index hints, if you have identified a situation where the optimizer picks the wrong plan, and no other workaround is available.OPTION (RECOMPILE)
Obviously the RECOMPILE hint can be used with dynamic SQL as well. If you went for dynamic SQL to get better performance, using RECOMPILE is counterproductive if you add it unconditionally. But if you know that you have some commonly used combination of input parameters where the best plan depend on the values, using RECOMPILE for that combination could make sense. As I've already discussed there is a cost for compilation, but neither is it any good if a cached plan takes ten seconds for a parameter set it's not well suited for.Conclusion
You have now seen several ways to implement this kind of search, both in dynamic SQL and static SQL. You have seen that in SQL 2008, it's possible to get good performance no matter if you use static or dynamic SQL, but to get static SQL to perform well, you have to pay the price of compiling the query each time, which on a busy system could be expensive. You have also seen that for very simple searches with very few search conditions, the best option may be to keep it simple and use IF statements.You have seen that a solution for static SQL can be very compact and easy to maintain. The solution for dynamic SQL is more verbose, and takes some more power to maintain. But you have also gotten glimpses that if the requirements for the search problem increase in complexity, dynamic SQL is a more viable solution. And again, you need at least CU5 of SQL 2008 SP1 or CU1 of SQL 2008 R2 to be able to use OPTION (RECOMPILE) in this way.
Let me stress once more that, no matter whether you go for dynamic or static SQL, you should test your procedure for all in input parameters and preferably some combinations too, both for correctness and for performance. And to test the performance, you need data which resembles production data. If you expect to have ten million orders and 500.000 customers in production, you cannot play with a toy database at the size of Northwind or even Northgale.
No comments:
Post a Comment