Thursday, February 16, 2012

>= AND <= or just = ?

This is just a general question. I just ran a query with several columns in
the WHERE Clause. It took about 10 seconds to complete. The WHERE clause
looked something like this:
DBCC DROPCLEANBUFFERS
GO
... WHERE LastName = 'Smith' AND FirstName = 'James' AND DOB_Year = 1960 ...
GO
I played around with it, and found that re-forming the WHERE Clause like
this:
DBCC DROPCLEANBUFFERS
GO
... WHERE LastName >= 'Smith' AND LastName <= 'Smith' AND FirstName >= 'James' AND FirstName <= 'James' AND DOB_Year >= 1960 AND DOB_Year <= 1960
...
GO
Resulted in the same query being executed in less than 1 second. Anyone
know why these two seemingly equivalent WHERE clauses would be so
drastically different in practice?Maybe the data is part of the dirty buffers, and is retrieved from cache
in the second query?
But seriously: check out the query plans and look at the differences.
There lies the answer.
If there really is a significant difference, then it would be
interesting to know what difference the query plan shows...
Gert-Jan
Michael C# wrote:
> This is just a general question. I just ran a query with several columns in
> the WHERE Clause. It took about 10 seconds to complete. The WHERE clause
> looked something like this:
> DBCC DROPCLEANBUFFERS
> GO
> ... WHERE LastName = 'Smith' AND FirstName = 'James' AND DOB_Year = 1960 ...
> GO
> I played around with it, and found that re-forming the WHERE Clause like
> this:
> DBCC DROPCLEANBUFFERS
> GO
> ... WHERE LastName >= 'Smith' AND LastName <= 'Smith' AND FirstName >=> 'James' AND FirstName <= 'James' AND DOB_Year >= 1960 AND DOB_Year <= 1960
> ...
> GO
> Resulted in the same query being executed in less than 1 second. Anyone
> know why these two seemingly equivalent WHERE clauses would be so
> drastically different in practice?|||According to the Query Plans it looks like switching from "=" syntax to ">=AND <=" syntax cut down the Estimated Rows on my Clustered Index Seek from
about 350,000 to about 7,000. That's the only difference - but wow, what a
difference! Anyone have any ideas on why this happens, and better yet, why
the Query Optimizer doesn't convert "=" to ">= AND <="? Now I'm wondering
what kind of effect it will have on non-clustered indexes and non-indexed
fields...
Thanks
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:42601468.9462CCDF@.toomuchspamalready.nl...
> Maybe the data is part of the dirty buffers, and is retrieved from cache
> in the second query?
> But seriously: check out the query plans and look at the differences.
> There lies the answer.
> If there really is a significant difference, then it would be
> interesting to know what difference the query plan shows...
> Gert-Jan
>
> Michael C# wrote:
>> This is just a general question. I just ran a query with several columns
>> in
>> the WHERE Clause. It took about 10 seconds to complete. The WHERE
>> clause
>> looked something like this:
>> DBCC DROPCLEANBUFFERS
>> GO
>> ... WHERE LastName = 'Smith' AND FirstName = 'James' AND DOB_Year = 1960
>> ...
>> GO
>> I played around with it, and found that re-forming the WHERE Clause like
>> this:
>> DBCC DROPCLEANBUFFERS
>> GO
>> ... WHERE LastName >= 'Smith' AND LastName <= 'Smith' AND FirstName >=>> 'James' AND FirstName <= 'James' AND DOB_Year >= 1960 AND DOB_Year <=>> 1960
>> ...
>> GO
>> Resulted in the same query being executed in less than 1 second. Anyone
>> know why these two seemingly equivalent WHERE clauses would be so
>> drastically different in practice?|||Tried it with a query on a couple of columns in a non-clustered Index, and
ended up with not-so-promising results. So far it appears to work best on
Clustered Indexes...
Thanks
"Michael C#" <howsa@.boutdat.com> wrote in message
news:Og1YtKfQFHA.2584@.TK2MSFTNGP15.phx.gbl...
> According to the Query Plans it looks like switching from "=" syntax to
> ">= AND <=" syntax cut down the Estimated Rows on my Clustered Index Seek
> from about 350,000 to about 7,000. That's the only difference - but wow,
> what a difference! Anyone have any ideas on why this happens, and better
> yet, why the Query Optimizer doesn't convert "=" to ">= AND <="? Now I'm
> wondering what kind of effect it will have on non-clustered indexes and
> non-indexed fields...
> Thanks
> "Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> news:42601468.9462CCDF@.toomuchspamalready.nl...
>> Maybe the data is part of the dirty buffers, and is retrieved from cache
>> in the second query?
>> But seriously: check out the query plans and look at the differences.
>> There lies the answer.
>> If there really is a significant difference, then it would be
>> interesting to know what difference the query plan shows...
>> Gert-Jan
>>
>> Michael C# wrote:
>> This is just a general question. I just ran a query with several
>> columns in
>> the WHERE Clause. It took about 10 seconds to complete. The WHERE
>> clause
>> looked something like this:
>> DBCC DROPCLEANBUFFERS
>> GO
>> ... WHERE LastName = 'Smith' AND FirstName = 'James' AND DOB_Year = 1960
>> ...
>> GO
>> I played around with it, and found that re-forming the WHERE Clause like
>> this:
>> DBCC DROPCLEANBUFFERS
>> GO
>> ... WHERE LastName >= 'Smith' AND LastName <= 'Smith' AND FirstName >=>> 'James' AND FirstName <= 'James' AND DOB_Year >= 1960 AND DOB_Year <=>> 1960
>> ...
>> GO
>> Resulted in the same query being executed in less than 1 second. Anyone
>> know why these two seemingly equivalent WHERE clauses would be so
>> drastically different in practice?
>|||I hope column DOB_Year doesn't happen to be a varchar? That would be an
explanation.
Does the performance also increase if you change just one of the
predicates? Or does a rewrite of each predicate improve the performance?
If the guess (above) about the data type turns out to be correct, then
only the predicate with DOB_Year would make the difference.
In addition, if the clustered index seek cuts down the estimated number
of rows, then the seek parameters must be different (or a different
index is used).
Gert-Jan
Michael C# wrote:
> According to the Query Plans it looks like switching from "=" syntax to ">=> AND <=" syntax cut down the Estimated Rows on my Clustered Index Seek from
> about 350,000 to about 7,000. That's the only difference - but wow, what a
> difference! Anyone have any ideas on why this happens, and better yet, why
> the Query Optimizer doesn't convert "=" to ">= AND <="? Now I'm wondering
> what kind of effect it will have on non-clustered indexes and non-indexed
> fields...
> Thanks
> "Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> news:42601468.9462CCDF@.toomuchspamalready.nl...
> > Maybe the data is part of the dirty buffers, and is retrieved from cache
> > in the second query?
> >
> > But seriously: check out the query plans and look at the differences.
> > There lies the answer.
> >
> > If there really is a significant difference, then it would be
> > interesting to know what difference the query plan shows...
> >
> > Gert-Jan
> >
> >
> > Michael C# wrote:
> >>
> >> This is just a general question. I just ran a query with several columns
> >> in
> >> the WHERE Clause. It took about 10 seconds to complete. The WHERE
> >> clause
> >> looked something like this:
> >>
> >> DBCC DROPCLEANBUFFERS
> >> GO
> >> ... WHERE LastName = 'Smith' AND FirstName = 'James' AND DOB_Year = 1960
> >> ...
> >> GO
> >>
> >> I played around with it, and found that re-forming the WHERE Clause like
> >> this:
> >>
> >> DBCC DROPCLEANBUFFERS
> >> GO
> >> ... WHERE LastName >= 'Smith' AND LastName <= 'Smith' AND FirstName >=> >> 'James' AND FirstName <= 'James' AND DOB_Year >= 1960 AND DOB_Year <=> >> 1960
> >> ...
> >> GO
> >>
> >> Resulted in the same query being executed in less than 1 second. Anyone
> >> know why these two seemingly equivalent WHERE clauses would be so
> >> drastically different in practice?|||DOB_Year is an INT. In addition, there are DOB_Month (INT) and DOB_Day
(INT) columns. I re-wrote all the predicates - did not try them
individually since this got me such a good result (no point in breaking it).
There is one Clustered Index on this table. No non-clustered indexes.
The Seek Parameters must be different then, and it appears to be caused by
the different operators used. Nothing else on the query, or on the table or
index have been changed. I just find this whole thing fascinating. I was
thinking this might be common knowledge I had missed out on somewhere along
the way. Anyways, thanks.
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:426022AB.7C56005D@.toomuchspamalready.nl...
>I hope column DOB_Year doesn't happen to be a varchar? That would be an
> explanation.
> Does the performance also increase if you change just one of the
> predicates? Or does a rewrite of each predicate improve the performance?
> If the guess (above) about the data type turns out to be correct, then
> only the predicate with DOB_Year would make the difference.
> In addition, if the clustered index seek cuts down the estimated number
> of rows, then the seek parameters must be different (or a different
> index is used).
> Gert-Jan
>
> Michael C# wrote:
>> According to the Query Plans it looks like switching from "=" syntax to
>> ">=>> AND <=" syntax cut down the Estimated Rows on my Clustered Index Seek
>> from
>> about 350,000 to about 7,000. That's the only difference - but wow, what
>> a
>> difference! Anyone have any ideas on why this happens, and better yet,
>> why
>> the Query Optimizer doesn't convert "=" to ">= AND <="? Now I'm
>> wondering
>> what kind of effect it will have on non-clustered indexes and non-indexed
>> fields...
>> Thanks
>> "Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
>> news:42601468.9462CCDF@.toomuchspamalready.nl...
>> > Maybe the data is part of the dirty buffers, and is retrieved from
>> > cache
>> > in the second query?
>> >
>> > But seriously: check out the query plans and look at the differences.
>> > There lies the answer.
>> >
>> > If there really is a significant difference, then it would be
>> > interesting to know what difference the query plan shows...
>> >
>> > Gert-Jan
>> >
>> >
>> > Michael C# wrote:
>> >>
>> >> This is just a general question. I just ran a query with several
>> >> columns
>> >> in
>> >> the WHERE Clause. It took about 10 seconds to complete. The WHERE
>> >> clause
>> >> looked something like this:
>> >>
>> >> DBCC DROPCLEANBUFFERS
>> >> GO
>> >> ... WHERE LastName = 'Smith' AND FirstName = 'James' AND DOB_Year =>> >> 1960
>> >> ...
>> >> GO
>> >>
>> >> I played around with it, and found that re-forming the WHERE Clause
>> >> like
>> >> this:
>> >>
>> >> DBCC DROPCLEANBUFFERS
>> >> GO
>> >> ... WHERE LastName >= 'Smith' AND LastName <= 'Smith' AND FirstName >=>> >> 'James' AND FirstName <= 'James' AND DOB_Year >= 1960 AND DOB_Year <=>> >> 1960
>> >> ...
>> >> GO
>> >>
>> >> Resulted in the same query being executed in less than 1 second.
>> >> Anyone
>> >> know why these two seemingly equivalent WHERE clauses would be so
>> >> drastically different in practice?|||> (INT) columns. I re-wrote all the predicates - did not try them
> individually since this got me such a good result (no point in breaking
> it).
Might be worth trying it so you'll know exactly what is going on next time
this comes up. Wouldn't hurt to validate Gert-Jan's theories, either.|||I can't get on the server right now - they're doing some network maintenance
at the facility. I'll duplicate the code and play around with it when I can
get access to them. BTW, the theory is that "changing DOB_Year from '='
format to '>= AND <=' format is the only change that makes a difference"
right? I'll test that later, but it would be interesting to know the train
of thought behind that particular theory. Thanks.
"Aaron [SQL Server MVP]" <ten.xoc@.dnartreb.noraa> wrote in message
news:OemA8zfQFHA.2948@.TK2MSFTNGP14.phx.gbl...
>> (INT) columns. I re-wrote all the predicates - did not try them
>> individually since this got me such a good result (no point in breaking
>> it).
> Might be worth trying it so you'll know exactly what is going on next time
> this comes up. Wouldn't hurt to validate Gert-Jan's theories, either.
>|||If the only index on the table is a clustered index, then only the
columns in this index are relevant. If any of the three column
(FirstName, LastName, DOB_Year) is not part of this index, then
rewriting them probably makes no difference.
When you look at the query plan, make sure you differentiate between
SEEK parameter and the predicates mentioned in the WHERE clause of the
SEEK operator. It is the SEEK parameter that primarily determines the
performance, because that determines which rows are read.
I must say, I am starting to get very curious to see the actual
(estimated) query plans of both queries. If you run "SET SHOWPLAN_TEXT
ON" before you run the query, then SQL-Server will only generate a query
plan. Please post both query plans. Maybe there is a data type mismatch
somewhere, maybe there is a bug (or flaw) that you have uncovered, maybe
we have overlooked something.
Gert-Jan
Michael C# wrote:
> DOB_Year is an INT. In addition, there are DOB_Month (INT) and DOB_Day
> (INT) columns. I re-wrote all the predicates - did not try them
> individually since this got me such a good result (no point in breaking it).
> There is one Clustered Index on this table. No non-clustered indexes.
> The Seek Parameters must be different then, and it appears to be caused by
> the different operators used. Nothing else on the query, or on the table or
> index have been changed. I just find this whole thing fascinating. I was
> thinking this might be common knowledge I had missed out on somewhere along
> the way. Anyways, thanks.
<snip>|||OK, they just got the network up down there and I was able to run a few
tests. Here are the results for you:
Test 1:
-- This one (in the Graphical Execution Plan) displayed an Estimated Row
Count of 7,328. Ran in less than 1 sec.
SET SHOWPLAN_TEXT ON
GO
SELECT od.OffenderID
FROM Offender_Details od
WHERE od.LName = 'Smith'
AND od.FName = 'James'
AND od.DOB_Year >= 1960
AND od.DOB_Year <= 1960
GO
SET SHOWPLAN_TEXT OFF
GO
-- ShowPlan Results:
--SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith' AND
od.FName = 'James' AND od.DOB_Year >= 1960 AND od.DOB_Year <= 1960
-- |--Clustered Index
Seek(OBJECT:([OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
AS [od]), SEEK:([od].[LName]=[@.1] AND [od].[FName]=[@.2] AND [od].[DOB_Year]
>= [@.3] AND [od].[DOB_Year] <= [@.4]) ORDERED FORWARD)
Test 2:
-- This one (in the Graphical Execution Plan) displayed an Estimated Row
Count of 7,678. Ran in less than 1 sec.
SET SHOWPLAN_TEXT ON
GO
SELECT od.OffenderID
FROM Offender_Details od
WHERE od.LName = 'Smith'
AND od.FName >= 'James'
AND od.FName <= 'James'
AND od.DOB_Year = 1960
GO
SET SHOWPLAN_TEXT OFF
GO
--ShowPlan Results:
--SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith' AND
od.FName >= 'James' AND od.FName <= 'James' AND od.DOB_Year = 1960
-- |--Clustered Index
Seek(OBJECT:([OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
AS [od]), SEEK:([od].[LName]=[@.1] AND ([od].[FName], [od].[DOB_Year]) >=([@.2], [@.4]) AND ([od].[FName], [od].[DOB_Year]) <= ([@.3], [@.4])),
WHERE:([od].[DOB_Ye
Test 3:
-- This one (in the Graphical Execution Plan) displayed an Estimated Row
Count of 357,915. Took about 10 secs to run.
SET SHOWPLAN_TEXT ON
GO
SELECT od.OffenderID
FROM Offender_Details od
WHERE od.LName = 'Smith'
AND od.FName = 'James'
AND od.DOB_Year = 1960
GO
SET SHOWPLAN_TEXT OFF
GO
--SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith' AND
od.FName = 'James' AND od.DOB_Year = 1960
-- |--Clustered Index
Seek(OBJECT:([OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
AS [od]), SEEK:([od].[LName]=[@.1] AND [od].[FName]=[@.2] AND
[od].[DOB_Year]=[@.3]) ORDERED FORWARD)
The FName and LName columns are VARCHAR(32) NOT NULL. The DOB_Year column
is INT NOT NULL. OffenderID column is a BIGINT. It is the Primary Key, but
is non-clustered. It is not part of the Clustered Index. The table has a
clustered index of (FName, LName, DOB_Year). Table has about 15 million
rows in it currently. I tried the same type of thing with a non-clustered
index columns on a different table and ended up with less inspiring results.
I also tried it on a couple non-indexed columns, and confirmed (for myself
at least) that there was no point to that test :)
It seems like >= AND <= in place of = on the clustered index columns made a
significant difference for me. It would be cool if the Query Optimizer
would automatically take care of this conversion for me internally in this
situation. I would think SQL Server would recognize the >= AND <= and the =as being equivalent and handle that when it generated the plan.
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:42602F3B.9612B18B@.toomuchspamalready.nl...
> If the only index on the table is a clustered index, then only the
> columns in this index are relevant. If any of the three column
> (FirstName, LastName, DOB_Year) is not part of this index, then
> rewriting them probably makes no difference.
> When you look at the query plan, make sure you differentiate between
> SEEK parameter and the predicates mentioned in the WHERE clause of the
> SEEK operator. It is the SEEK parameter that primarily determines the
> performance, because that determines which rows are read.
> I must say, I am starting to get very curious to see the actual
> (estimated) query plans of both queries. If you run "SET SHOWPLAN_TEXT
> ON" before you run the query, then SQL-Server will only generate a query
> plan. Please post both query plans. Maybe there is a data type mismatch
> somewhere, maybe there is a bug (or flaw) that you have uncovered, maybe
> we have overlooked something.
> Gert-Jan
>
> Michael C# wrote:
>> DOB_Year is an INT. In addition, there are DOB_Month (INT) and DOB_Day
>> (INT) columns. I re-wrote all the predicates - did not try them
>> individually since this got me such a good result (no point in breaking
>> it).
>> There is one Clustered Index on this table. No non-clustered indexes.
>> The Seek Parameters must be different then, and it appears to be caused
>> by
>> the different operators used. Nothing else on the query, or on the table
>> or
>> index have been changed. I just find this whole thing fascinating. I
>> was
>> thinking this might be common knowledge I had missed out on somewhere
>> along
>> the way. Anyways, thanks.
> <snip>|||BTW, in the end, this particular query returns just one record... although
there are situations where I'll be pulling back as many as 500+ records with
a search like this. Thanks.
"Michael C#" <howsa@.boutdat.com> wrote in message
news:OYBUAVgQFHA.1392@.TK2MSFTNGP10.phx.gbl...
> OK, they just got the network up down there and I was able to run a few
> tests. Here are the results for you:
> Test 1:
> -- This one (in the Graphical Execution Plan) displayed an Estimated Row
> Count of 7,328. Ran in less than 1 sec.
> SET SHOWPLAN_TEXT ON
> GO
> SELECT od.OffenderID
> FROM Offender_Details od
> WHERE od.LName = 'Smith'
> AND od.FName = 'James'
> AND od.DOB_Year >= 1960
> AND od.DOB_Year <= 1960
> GO
> SET SHOWPLAN_TEXT OFF
> GO
> -- ShowPlan Results:
> --SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith'
> AND od.FName = 'James' AND od.DOB_Year >= 1960 AND od.DOB_Year <= 1960
> -- |--Clustered Index
> Seek(OBJECT:([OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
> AS [od]), SEEK:([od].[LName]=[@.1] AND [od].[FName]=[@.2] AND
> [od].[DOB_Year]
> >= [@.3] AND [od].[DOB_Year] <= [@.4]) ORDERED FORWARD)
>
> Test 2:
> -- This one (in the Graphical Execution Plan) displayed an Estimated Row
> Count of 7,678. Ran in less than 1 sec.
> SET SHOWPLAN_TEXT ON
> GO
> SELECT od.OffenderID
> FROM Offender_Details od
> WHERE od.LName = 'Smith'
> AND od.FName >= 'James'
> AND od.FName <= 'James'
> AND od.DOB_Year = 1960
> GO
> SET SHOWPLAN_TEXT OFF
> GO
> --ShowPlan Results:
> --SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith'
> AND od.FName >= 'James' AND od.FName <= 'James' AND od.DOB_Year = 1960
> -- |--Clustered Index
> Seek(OBJECT:([OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
> AS [od]), SEEK:([od].[LName]=[@.1] AND ([od].[FName], [od].[DOB_Year]) >=> ([@.2], [@.4]) AND ([od].[FName], [od].[DOB_Year]) <= ([@.3], [@.4])),
> WHERE:([od].[DOB_Ye
> Test 3:
> -- This one (in the Graphical Execution Plan) displayed an Estimated Row
> Count of 357,915. Took about 10 secs to run.
> SET SHOWPLAN_TEXT ON
> GO
> SELECT od.OffenderID
> FROM Offender_Details od
> WHERE od.LName = 'Smith'
> AND od.FName = 'James'
> AND od.DOB_Year = 1960
> GO
> SET SHOWPLAN_TEXT OFF
> GO
> --SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith'
> AND od.FName = 'James' AND od.DOB_Year = 1960
> -- |--Clustered Index
> Seek(OBJECT:([OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
> AS [od]), SEEK:([od].[LName]=[@.1] AND [od].[FName]=[@.2] AND
> [od].[DOB_Year]=[@.3]) ORDERED FORWARD)
> The FName and LName columns are VARCHAR(32) NOT NULL. The DOB_Year column
> is INT NOT NULL. OffenderID column is a BIGINT. It is the Primary Key,
> but is non-clustered. It is not part of the Clustered Index. The table
> has a clustered index of (FName, LName, DOB_Year). Table has about 15
> million rows in it currently. I tried the same type of thing with a
> non-clustered index columns on a different table and ended up with less
> inspiring results. I also tried it on a couple non-indexed columns, and
> confirmed (for myself at least) that there was no point to that test :)
> It seems like >= AND <= in place of = on the clustered index columns made
> a significant difference for me. It would be cool if the Query Optimizer
> would automatically take care of this conversion for me internally in this
> situation. I would think SQL Server would recognize the >= AND <= and the
> = as being equivalent and handle that when it generated the plan.
>
> "Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> news:42602F3B.9612B18B@.toomuchspamalready.nl...
>> If the only index on the table is a clustered index, then only the
>> columns in this index are relevant. If any of the three column
>> (FirstName, LastName, DOB_Year) is not part of this index, then
>> rewriting them probably makes no difference.
>> When you look at the query plan, make sure you differentiate between
>> SEEK parameter and the predicates mentioned in the WHERE clause of the
>> SEEK operator. It is the SEEK parameter that primarily determines the
>> performance, because that determines which rows are read.
>> I must say, I am starting to get very curious to see the actual
>> (estimated) query plans of both queries. If you run "SET SHOWPLAN_TEXT
>> ON" before you run the query, then SQL-Server will only generate a query
>> plan. Please post both query plans. Maybe there is a data type mismatch
>> somewhere, maybe there is a bug (or flaw) that you have uncovered, maybe
>> we have overlooked something.
>> Gert-Jan
>>
>> Michael C# wrote:
>> DOB_Year is an INT. In addition, there are DOB_Month (INT) and DOB_Day
>> (INT) columns. I re-wrote all the predicates - did not try them
>> individually since this got me such a good result (no point in breaking
>> it).
>> There is one Clustered Index on this table. No non-clustered indexes.
>> The Seek Parameters must be different then, and it appears to be caused
>> by
>> the different operators used. Nothing else on the query, or on the
>> table or
>> index have been changed. I just find this whole thing fascinating. I
>> was
>> thinking this might be common knowledge I had missed out on somewhere
>> along
>> the way. Anyways, thanks.
>> <snip>
>|||Michael,
Thanks for posting the query plans. It is an interesting case.
The third query plan should be lightning fast! It does an exact match on
the clustered index key. It should be faster than queries 1 and 2, which
only partially seek the (same) index. Query 2 should be the slowest.
Do you have the latest service pack installed?
The only explanation I can think of, is that the third query is not
actually using the clustered index, but is scanning the primary key
index. All nonclustered indexes also contain the clustered index key.
For your query, the primary key index is a covering index. In that
scenario, the (much higher) row count estimation would make sense.
However, that is not what the query plan is reporting, so there is a bug
somewhere...
SQL-Server would consider two access paths for your query, and selected
the fastest.
Scenario 1: seek the nonunique clustered index and return the all
corresponding rows.
Scenario 2: scan the (entire) unique primary key index and return the
corresponding OffenderID.
SQL-Server will optimize for a cold cache, so it will minimize the
reads. If the table is very wide, and the primary key index narrow, then
reading the entire primary key index might involve less reads than
seeking the clustered index. In that case (and only in that case) it
would select scenario 1.
You probably have a hot cache. Most of the table data will be in memory.
For your situation SQL-Server's choice is obviously a very poor choice.
Maybe because of its focus on a cold cache, but most likely it is simply
a bug / flaw.
You can use SQL Profiler to view the actual query plan (event
Performance/Execution Plan), or in SQL Query Analyzer, press CTRL + K
before running the query. Maybe that will report a nonclustered index
scan for the 'bad' query plan...
Hope this helps,
Gert-Jan
<snip>
> Test 3:
> -- This one (in the Graphical Execution Plan) displayed an Estimated Row
> Count of 357,915. Took about 10 secs to run.
> SET SHOWPLAN_TEXT ON
> GO
> SELECT od.OffenderID
> FROM Offender_Details od
> WHERE od.LName = 'Smith'
> AND od.FName = 'James'
> AND od.DOB_Year = 1960
> GO
> SET SHOWPLAN_TEXT OFF
> GO
> --SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith' AND
> od.FName = 'James' AND od.DOB_Year = 1960
> -- |--Clustered Index
> Seek(OBJECT:([OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
> AS [od]), SEEK:([od].[LName]=[@.1] AND [od].[FName]=[@.2] AND
> [od].[DOB_Year]=[@.3]) ORDERED FORWARD)
<snip>
> OffenderID column is a BIGINT. It is the Primary Key, but is non-clustered.
<snip>|||"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:42604250.4392E900@.toomuchspamalready.nl...
> Michael,
> Thanks for posting the query plans. It is an interesting case.
> The third query plan should be lightning fast! It does an exact match on
> the clustered index key. It should be faster than queries 1 and 2, which
> only partially seek the (same) index. Query 2 should be the slowest.
> Do you have the latest service pack installed?
Service Pack 3a. I'm testing Service Pack 4 Beta on another machine but it
doesn't have this particular database installed on it. And no way am I
putting a Beta on my production server :)
> The only explanation I can think of, is that the third query is not
> actually using the clustered index, but is scanning the primary key
> index. All nonclustered indexes also contain the clustered index key.
> For your query, the primary key index is a covering index. In that
> scenario, the (much higher) row count estimation would make sense.
> However, that is not what the query plan is reporting, so there is a bug
> somewhere...
You're right, the query plan doesn't seem to be concerned with the Primary
Key, which is good :) But the extremely high row count estimation is
throwing me a little off here. I suppose if they have a fix for this in SP
4, then I'll fix my query accordingly when it's released. In the meantime,
I'll just leave it in the >= AND <= format.
> SQL-Server would consider two access paths for your query, and selected
> the fastest.
> Scenario 1: seek the nonunique clustered index and return the all
> corresponding rows.
> Scenario 2: scan the (entire) unique primary key index and return the
> corresponding OffenderID.
> SQL-Server will optimize for a cold cache, so it will minimize the
> reads. If the table is very wide, and the primary key index narrow, then
> reading the entire primary key index might involve less reads than
> seeking the clustered index. In that case (and only in that case) it
> would select scenario 1.
> You probably have a hot cache. Most of the table data will be in memory.
> For your situation SQL-Server's choice is obviously a very poor choice.
> Maybe because of its focus on a cold cache, but most likely it is simply
> a bug / flaw.
> You can use SQL Profiler to view the actual query plan (event
> Performance/Execution Plan), or in SQL Query Analyzer, press CTRL + K
> before running the query. Maybe that will report a nonclustered index
> scan for the 'bad' query plan...
Already done that :)
I've been performing DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE before
each test. Is there another statement I can use to ensure all cached data
is cleared out before I test it again?
When I look at the actual query plan, I get the following results:
For Test #1:
--SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith' AND
od.FName = 'James' AND od.DOB_Year >= 1960 AND od.DOB_Year <= 1960
Phys. Op: Clustered Index Seek
Log. Op: Clustered Index Seek
Row Count: 226
Est. Row Size: 15
I/O Cost: 0.0750
CPU Cost: 0.00813
No. of Executes: 1
Cost: 0.083195
Subtree Cost: 0.0831
Est. Row Count: 7,328
For Test #3:
--SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith' AND
od.FName = 'James' AND od.DOB_Year = 1960
Phys. Op: Clustered Index Seek
Log. Op: Clustered Index Seek
Row Count: 120
Est. Row Size: 15
I/O Cost: 3.53
CPU Cost: 0.393
No. of Executes: 1
Cost: 3.926618
Subtree Cost: 3.93
Est. Row Count: 357,915
> Hope this helps,
> Gert-Jan
Actually I still don't understand why "=" on a clustered index seek is
slower than ">= AND <=", but for now I'm just happy I stumbled onto the
info. Thanks.|||<snip>
> I've been performing DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE before
> each test. Is there another statement I can use to ensure all cached data
> is cleared out before I test it again?
No, that's about it. The only thing you could add is CHECKPOINT so the
dirty pages are written to disk.
And since you were testing with a cold cache, this affirms that it is a
bug.
<snip>
> Actually I still don't understand why "=" on a clustered index seek is
> slower than ">= AND <=", but for now I'm just happy I stumbled onto the
> info. Thanks.
It isn't faster. Or at least, it shouldn't be. This is definitely a bug.
I have never seen that the query plan reports a clustered index seek and
that the index is not actually seeked.
I tried to reproduce the problem by creating a table with similar
properties as yours. I tried it with a 5 GB 15 mln row table and with a
9 GB 10 mln row table, but didn't succeed in replicating the problem.
But I did notice a very small difference. Your query plan reports
"[od].[DOB_Year]=[@.3]" whereas mine reported
"[od].[DOB_Year]=Convert([@.3])", but I don't see how that would explain
the problem.
Since it is a bug, maybe one of these syntaxis work around it:
1) force use of clustered index
SELECT OffenderID FROM Offender_Details (index=1)
WHERE LName = 'Smith' AND FName = 'James' AND DOB_Year = 1960
2) make sure it is not the parallellism bug
SELECT OffenderID FROM Offender_Details
WHERE LName = 'Smith' AND FName = 'James' AND DOB_Year = 1960
OPTION (maxdop 1)
3) make sure it is not a casting problem
SELECT OffenderID FROM Offender_Details
WHERE LName = 'Smith' AND FName = 'James' AND DOB_Year = CAST(1960 as
int)
Maybe somebody else can reproduce the problem, or maybe it is possible
to create a repro script. I am out of ideas...
Gert-Jan

No comments:

Post a Comment