How do you tune SQL queries to
improve performance?
Tuning
your SQL queries can have a significantly positive impact on performance. And
understanding how your specific RDBMS works can help tremendously as well. But
here we will go over some tips on how to tune SQL queries in general, non-RDBMS
specific terms.
What is a
query execution plan?
Understanding query execution
plans is one of the first steps to properly tuning SQL queries. So, what is a
query execution plan – also known as an explain plan? Well, a query execution
plan lists all the details of how that particular RDBMS plans on processing a
particular query. Inside this plan are details on how the index will be used,
how joins will be performed (and their associated logic), and also an estimate
of the resource cost. Understanding the explain plan utility for your
particular RDBMS is critical if you want to successfully tune SQL queries.
Here are
some non-RDBMS specific things to keep in mind when tuning your SQL queries to
improve performance:
Reduce
the rows that are returned by a query
This is
fairly obvious – if a query returns less rows, then clearly the query will be
more efficient.
Get rid
of unnecessary columns and tables
Even
though this isn’t a change made to your actual query, the less unused space in
your database, the more efficient your queries will be. This is another pretty
obvious one.
GROUP BY may be better to use
than DISTINCT. In some DBMS’s GROUP BY is a more efficient way of retrieving
unique rows than DISTINCT. This is because GROUP BY performs the sort that finds
duplicates earlier in the processing of a query than DISTINCT. The DISTINCT
clause will perform the sort at the very last step, and will do this against
the final result set. Why does it matter when the sort is performed? Well, if
duplicate rows are eliminated earlier on in the processing of a query, then it
means that the rest of the processing of the query will be more efficient
because there will presumably be less rows to perform the rest of processing
on, since the duplicates have already been eliminated. For your particular
RDBMS, you should look at the explain plans for running a query with GROUP BY
or DISTINCT to see how they compare.
Hints
might help you tune your SQL queries. What is a hint? A hint is special syntax
that you can put inside your SQL. What does a hint do? Well, it tells the query
optimizer to perform a certain action, like if you want to tell the optimizer
to use a certain method to join tables, or if you want to tell the optimizer to
use a certain index.
Understand
your optimizer to help you tune SQL queries
Knowing
how the query optimizer for your particular RDBMS works can be a big help. This
is because every optimizer does things differently. Let’s go through some
things that you should keep in mind when dealing with query optimizers:
·
Not enough database statistics
Suppose
there are not enough statistics about the database. Since cost based optimizers rely on those statistics to perform
their analysis, some optimizers may have to use a rule based optimizer instead
in that case. And other databases may decide not to use an index at all and
just do a full table scan instead.
·
Are order of predicates taken into account?
You
should know whether or not your optimizer takes the order of the predicates in
a WHERE clause into account, and whether that order has any effect on the order
in which the predicates are actually evaluated. What does that mean in plain English?
Well, a predicate is the comparison portion of the WHERE clause. So, for
example, if we have some SQL that says “WHERE website_name =
“ProgrammerInterview.com”, then in that particular SQL query there is just one
predicate comparing the website_name column to the text
“ProgrammerInterview.com”. But, if we have some SQL that says “WHERE
website_name = “ProgrammerInterview.com AND website_subject=”technical””, then
we have two different predicates – one that checks for the website name and
another that checks for the website category.
Now that
we’ve cleared up what we mean by predicates, let’s get back to the original
topic. So, we said that you should know if your optimizer takes the order of the predicates in a WHERE clause
into account, and if that order affects the order in which the predicates are
evaluated. But, why should the order of the predicates matter? Well, if you
optimizer does take the order into account, then you would want the predicate
that eliminates the higher number of rows to be evaluated first by the
optimizer. So, for example, let’s say that we have a table called
Websites which has columns for the website_active and the website_subject. The
website_active column is just a “yes” or a “no” entry, and let’s assume that
most of the rows (something like 90%) in the table have a “yes” value for
website_active. But, let’s also say that there are three possible subjects –
like “technical”, “self help”, “cooking”, etc. And, the subjects are evenly
distributed amongst the rows – so 1/3rd of the rows are technical, 1/3rd are
self help, etc.
Now,
let’s say we want to run a query with a where clause like this “WHERE
website_subject = ‘cooking’ AND website_active = ‘NO’. Which predicate of the
WHERE clause should be executed first – the website_active = ‘NO’ or the
website_subject = ‘cooking’? Well, think about that on your own for a second.
Wouldn’t it make more sense to run the predicate which eliminates more rows
first? That way, the second predicate has less rows to process. With that in mind,
let’s ask ourself which predicate will eliminate more rows? Wouldn’t it be the
check to see “WHERE website_active = ‘NO’? Because, that check will eliminate
90% of the rows in the table. But the check for “website_subject = ‘cooking'”
would only eliminate 67% of the rows, so clearly 90% is better which means that
the predicate that checks website_active = ‘NO’ should be run first.
·
Are order of table names being taken into account?
Just as
the order of the predicates being used in the
WHERE clause can have a big effect on the efficiency of the query, so can the
order of the table names in the JOIN or FROM clause. This is especially true
with rule based optimizers. Of course, the best thing for the RDBMS to do is to
choose the most selective table first. This way the most number
of rows will be removed from the result set, which means that less rows will
have to be processed and the query can run more efficiently. You should check
to see what your particular optimizer does to see if you need to tune your SQL
queries accordingly.
·
Are queries being rewritten?
You
should also check to see if your optimizer rewrites queries into more efficient
formats. One example of this is when optimizers will rewrite subqueries into
their equivalent joins, and that will make the processing that must follow much
simpler. For some DBMS’s, there are certain options that have to be enabled so
that the optimizer can actually rewrite queries.
Tune SQL
queries by ensuring that your indexes perform well
One very
important thing that can help your SQL queries run better is making sure that
your database indexes also perform well. Read here for more details on that
subject:Improving Index performance in SQL.
Summary
of tips on how to to tune your SQL queries
In
general, you should know the options available to you that may help you in
tuning your SQL queries. Of course, not everything we presented above will help
make your SQL perform better – because everyone’s particular situation is
different. But, knowing your options is critical – as the saying goes “To the
man with a hammer, every problem looks like a nail.” Make sure that you have
more than just a hammer in your toolbox!
No comments:
Post a Comment