SQL for dummies (.NET developers)
March 18th, 2008A serious post? Yeah, I know. Most of you think that it’s not possible. Well shame on you!
I wanted to provide this little awareness piece on basic SQL performance standards. I’m gearing this toward developers that know SQL and can write relatively complex queries and procedures.
The goal is a quick and dirty reference for the most common things you can do to improve query performance, database integrity, and application interfacing. Take note that I am by no means an expert. I have some pretty extensive experience, but I don’t know everything. If you see something that you feel is not correct please correct me!
1. General query performance
Using Indexed columns: When you use WHERE, ORDER, GROUP, or DISTINCT calls you should try to use indexed columns. This will improve performance on those queries. Generally, returning an indexed column will be quicker then a non-indexed column. So using the * in a select is almost never a good idea. Selecting only the columns you need will always provide better performance. On the other hand, maintaining a large number of indexes can adversely affect performance as updates and inserts have to also update the indexes.
Sub queries: A lot of sub queries can be rewritten as a joins or EXISTS or a combination of. IN and NOT IN likewise can usually be expressed as an EXISTS or derived table. These options are almost always quicker. You can use a derived table to achieve the same result and a much faster return.
select account_id, last_update
from account
where account_id in (select top 100 account_id
from account
order by last_update asc)
select account_id, last_update
from (select top 100 account_id, last_update
from account
order by last_update asc) as a
In the above example, the first query will perform worse and worse as more data is thrown at it. The second example will scale quite well and will always out perform the first.
Where clause ordering: A simple performance improvement can be made if you are using multiple where conditions. Make sure you are ordering them from most limiting first to least limiting. That way the first condition is filtered first, leaving you with fewer results to run through the remaining conditions.
2. Locking and Isolation
Using locks and isolation levels can be a bit daunting. But there are a lot of cases where the defaults just don’t give you very good results. For instance, using NOLOCK on a select can improve its performance, reducing the IO time. But it also is considered a dirty read of the data. Meaning it can pull uncommitted data. Depending on the table you are referencing this can be either acceptable or very bad. Either way, knowing the advantages and drawbacks of locking hints are a key to writing good queries that apply an understanding of your database schema to the data. That is not to say that you should start adding lock hits to all your queries. SQL will choose the appropriate locking hint in most cases. But when you need certain behavior it is available to you.
Let’s consider an update to a large table. SQL will determine based on the load and number of updates in the block of rows which type of lock to issue. This can be a page lock (multiple rows) or even a table lock. If this is a table that has to be queried frequently, updates could really hurt performance.
update account with (rowlock)
set last_update = @last_update,
account_balance = @account_balance
where account_id = @account_id
Adding the ROWLOCK to the update asks SQL to only lock the row that is being updated. This has its drawbacks as well, if a large number of updates can occur the server has to maintain a large number of row locks, which can use a huge amount of memory and slow down the entire server.
Isolation levels in a stored procedure have the same effect as putting locks on all the tables. For instance the isolation SET TRANSACTION ISOLATION READ UNCOMMITTED has the same effect as putting WITH (NOLOCK) on all the tables in your procedure. This can be handy when you have a lot of tables that need the same locking hints.
Keep in mind that there are table-level locks as well as row-level locks. MSDN has some good references on the subject.
3. Transactions
In general you should try to use transactions when you are doing more than one thing at a time. Try to keep them short, as a transaction will keep the connection open for the duration of the commands. Also, if you have to hit a heavily accessed table, do so at the end of the transaction. In ASP.NET, the Microsoft Application Blocks library provides us with handy SQL helper methods. Keep in mind that all those methods have an overload that allow you to manually use a transactions if you need to do so in your application.
There is a well referenced case that illustrates the prevalent lack of correct transaction use in stored procedures. I’ve seen this exact query in more then a few incarnations.
CREATE PROCEDURE [dbo].[Account_Add]
@account_id int,
@last_update datetime,
@account_balance decimal
AS
SET NOCOUNT ON
IF EXISTS (select account_id from account where account_id = @account_id)
BEGIN
update account
set last_update = @last_update,
account_balance = @account_balance
where account_id = @ account_id
END
ELSE
BEGIN
insert into account (account_id, last_update, account_balance)
values (@account_id, @last_update, @account_balance)
END
Well, this code assumes that there will only be a single user executing the procedure at a time, which is almost never a safe assumption in the .NET world. You can fix this situation with a few simple updates.
CREATE PROCEDURE [dbo].[Account_Add]
@ account_id int,
@last_update datetime,
@account_balance decimal
AS
SET NOCOUNT ON
SET XACT_ABORT ON
BEGIN TRAN
IF EXISTS (select account_id from account with (UPDLOCK, HOLDLOCK) where account_id = @account_id)
BEGIN
update account
set last_update = @last_update,
account_balance = @account_balance
where account_id = @account_id
END
ELSE
BEGIN
insert into account (account_id, last_update, account_balance)
values (@account_id, @last_update, @account_balance)
END
COMMIT
Notice we have introduced some locking hints and a transaction. The hints tell SQL to lock that row down so other processes have to wait their turn. The transaction allows the locks to be maintained until the query is complete.
4. Things to avoid
- <> and NOT are costly. If possible do not use them. Best to worst performance for the common comparers are “=”, “>”, “>=”, “<”, “<=”, “LIKE”, “<>”.
- Cursers are extremely costly. Avoid if at all possible.
- Temp tables have their uses, but most problems have a much less costly solution.
- Select * is almost always avoidable. Try to only return a minimum number of columns.
- Use properly constructed WHERE clauses to only return what is needed.
- Don’t sort data unless it is needed! Sorts are a performance hit.
- Performance-wise, string manipulation is typically better done in your app, as SQL doesn’t have nearly as efficient functions.
- Don’t leave out the SET NOCOUNT ON in stored procedures! It should be included in all your procedures to reduce round trips.
5. Resources
Microsoft has a Best Practices Analyzer Tool for SQL 2000. I’ve had limited experience with it, but it does highlight a lot of things that you may not have thought of when setting up a server or are trying to tune a database. Overall a good tool to have used at least once and understand!
Use and get comfortable with the Execution Plan in query analyzer and SSMSE. This is probably the best resource you have to identify performance issues with queries.
Another great resource are the Microsoft MVPs. Most have blogs, and most are quite active.
6. Final Remarks
When you are writing queries for an application, you should always leverage which of the two are better at handling problems. If you are afraid of SQL you will make your application do more of the processing work, when frequently your database has much faster and more efficient means of performing those tasks.
Correct use of the tools at your disposal will produce much better solutions to the customer. Mike Minton is always talking about how we need to use the right tool for the problem. Developers need to be careful to utilize SQL’s strengths instead of doing all of their processing on the application side.
Cheers!
















