Thursday, February 16, 2012

>= AND <= or just = ?

This is just a general question. I just ran a query with several columns in
the WHERE Clause. It took about 10 seconds to complete. The WHERE clause
looked something like this:
DBCC DROPCLEANBUFFERS
GO
... WHERE LastName = 'Smith' AND FirstName = 'James' AND DOB_Year = 1960 ...
GO
I played around with it, and found that re-forming the WHERE Clause like
this:
DBCC DROPCLEANBUFFERS
GO
... WHERE LastName >= 'Smith' AND LastName <= 'Smith' AND FirstName >=
'James' AND FirstName <= 'James' AND DOB_Year >= 1960 AND DOB_Year <= 1960
...
GO
Resulted in the same query being executed in less than 1 second. Anyone
know why these two seemingly equivalent WHERE clauses would be so
drastically different in practice?
Maybe the data is part of the dirty buffers, and is retrieved from cache
in the second query?
But seriously: check out the query plans and look at the differences.
There lies the answer.
If there really is a significant difference, then it would be
interesting to know what difference the query plan shows...
Gert-Jan
Michael C# wrote:
> This is just a general question. I just ran a query with several columns in
> the WHERE Clause. It took about 10 seconds to complete. The WHERE clause
> looked something like this:
> DBCC DROPCLEANBUFFERS
> GO
> ... WHERE LastName = 'Smith' AND FirstName = 'James' AND DOB_Year = 1960 ...
> GO
> I played around with it, and found that re-forming the WHERE Clause like
> this:
> DBCC DROPCLEANBUFFERS
> GO
> ... WHERE LastName >= 'Smith' AND LastName <= 'Smith' AND FirstName >=
> 'James' AND FirstName <= 'James' AND DOB_Year >= 1960 AND DOB_Year <= 1960
> ...
> GO
> Resulted in the same query being executed in less than 1 second. Anyone
> know why these two seemingly equivalent WHERE clauses would be so
> drastically different in practice?
|||According to the Query Plans it looks like switching from "=" syntax to ">=
AND <=" syntax cut down the Estimated Rows on my Clustered Index Seek from
about 350,000 to about 7,000. That's the only difference - but wow, what a
difference! Anyone have any ideas on why this happens, and better yet, why
the Query Optimizer doesn't convert "=" to ">= AND <="? Now I'm wondering
what kind of effect it will have on non-clustered indexes and non-indexed
fields...
Thanks
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:42601468.9462CCDF@.toomuchspamalready.nl...[vbcol=seagreen]
> Maybe the data is part of the dirty buffers, and is retrieved from cache
> in the second query?
> But seriously: check out the query plans and look at the differences.
> There lies the answer.
> If there really is a significant difference, then it would be
> interesting to know what difference the query plan shows...
> Gert-Jan
>
> Michael C# wrote:
|||Tried it with a query on a couple of columns in a non-clustered Index, and
ended up with not-so-promising results. So far it appears to work best on
Clustered Indexes...
Thanks
"Michael C#" <howsa@.boutdat.com> wrote in message
news:Og1YtKfQFHA.2584@.TK2MSFTNGP15.phx.gbl...
> According to the Query Plans it looks like switching from "=" syntax to
> ">= AND <=" syntax cut down the Estimated Rows on my Clustered Index Seek
> from about 350,000 to about 7,000. That's the only difference - but wow,
> what a difference! Anyone have any ideas on why this happens, and better
> yet, why the Query Optimizer doesn't convert "=" to ">= AND <="? Now I'm
> wondering what kind of effect it will have on non-clustered indexes and
> non-indexed fields...
> Thanks
> "Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> news:42601468.9462CCDF@.toomuchspamalready.nl...
>
|||I hope column DOB_Year doesn't happen to be a varchar? That would be an
explanation.
Does the performance also increase if you change just one of the
predicates? Or does a rewrite of each predicate improve the performance?
If the guess (above) about the data type turns out to be correct, then
only the predicate with DOB_Year would make the difference.
In addition, if the clustered index seek cuts down the estimated number
of rows, then the seek parameters must be different (or a different
index is used).
Gert-Jan
Michael C# wrote:[vbcol=seagreen]
> According to the Query Plans it looks like switching from "=" syntax to ">=
> AND <=" syntax cut down the Estimated Rows on my Clustered Index Seek from
> about 350,000 to about 7,000. That's the only difference - but wow, what a
> difference! Anyone have any ideas on why this happens, and better yet, why
> the Query Optimizer doesn't convert "=" to ">= AND <="? Now I'm wondering
> what kind of effect it will have on non-clustered indexes and non-indexed
> fields...
> Thanks
> "Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> news:42601468.9462CCDF@.toomuchspamalready.nl...
|||DOB_Year is an INT. In addition, there are DOB_Month (INT) and DOB_Day
(INT) columns. I re-wrote all the predicates - did not try them
individually since this got me such a good result (no point in breaking it).
There is one Clustered Index on this table. No non-clustered indexes.
The Seek Parameters must be different then, and it appears to be caused by
the different operators used. Nothing else on the query, or on the table or
index have been changed. I just find this whole thing fascinating. I was
thinking this might be common knowledge I had missed out on somewhere along
the way. Anyways, thanks.
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:426022AB.7C56005D@.toomuchspamalready.nl...[vbcol=seagreen]
>I hope column DOB_Year doesn't happen to be a varchar? That would be an
> explanation.
> Does the performance also increase if you change just one of the
> predicates? Or does a rewrite of each predicate improve the performance?
> If the guess (above) about the data type turns out to be correct, then
> only the predicate with DOB_Year would make the difference.
> In addition, if the clustered index seek cuts down the estimated number
> of rows, then the seek parameters must be different (or a different
> index is used).
> Gert-Jan
>
> Michael C# wrote:
|||> (INT) columns. I re-wrote all the predicates - did not try them
> individually since this got me such a good result (no point in breaking
> it).
Might be worth trying it so you'll know exactly what is going on next time
this comes up. Wouldn't hurt to validate Gert-Jan's theories, either.
|||If the only index on the table is a clustered index, then only the
columns in this index are relevant. If any of the three column
(FirstName, LastName, DOB_Year) is not part of this index, then
rewriting them probably makes no difference.
When you look at the query plan, make sure you differentiate between
SEEK parameter and the predicates mentioned in the WHERE clause of the
SEEK operator. It is the SEEK parameter that primarily determines the
performance, because that determines which rows are read.
I must say, I am starting to get very curious to see the actual
(estimated) query plans of both queries. If you run "SET SHOWPLAN_TEXT
ON" before you run the query, then SQL-Server will only generate a query
plan. Please post both query plans. Maybe there is a data type mismatch
somewhere, maybe there is a bug (or flaw) that you have uncovered, maybe
we have overlooked something.
Gert-Jan
Michael C# wrote:
> DOB_Year is an INT. In addition, there are DOB_Month (INT) and DOB_Day
> (INT) columns. I re-wrote all the predicates - did not try them
> individually since this got me such a good result (no point in breaking it).
> There is one Clustered Index on this table. No non-clustered indexes.
> The Seek Parameters must be different then, and it appears to be caused by
> the different operators used. Nothing else on the query, or on the table or
> index have been changed. I just find this whole thing fascinating. I was
> thinking this might be common knowledge I had missed out on somewhere along
> the way. Anyways, thanks.
<snip>
|||OK, they just got the network up down there and I was able to run a few
tests. Here are the results for you:
Test 1:
-- This one (in the Graphical Execution Plan) displayed an Estimated Row
Count of 7,328. Ran in less than 1 sec.
SET SHOWPLAN_TEXT ON
GO
SELECT od.OffenderID
FROM Offender_Details od
WHERE od.LName = 'Smith'
AND od.FName = 'James'
AND od.DOB_Year >= 1960
AND od.DOB_Year <= 1960
GO
SET SHOWPLAN_TEXT OFF
GO
-- ShowPlan Results:
--SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith' AND
od.FName = 'James' AND od.DOB_Year >= 1960 AND od.DOB_Year <= 1960
-- |--Clustered Index
Seek(OBJECT[OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
AS [od]), SEEK[od].[LName]=[@.1] AND [od].[FName]=[@.2] AND [od].[DOB_Year]
>= [@.3] AND [od].[DOB_Year] <= [@.4]) ORDERED FORWARD)
Test 2:
-- This one (in the Graphical Execution Plan) displayed an Estimated Row
Count of 7,678. Ran in less than 1 sec.
SET SHOWPLAN_TEXT ON
GO
SELECT od.OffenderID
FROM Offender_Details od
WHERE od.LName = 'Smith'
AND od.FName >= 'James'
AND od.FName <= 'James'
AND od.DOB_Year = 1960
GO
SET SHOWPLAN_TEXT OFF
GO
--ShowPlan Results:
--SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith' AND
od.FName >= 'James' AND od.FName <= 'James' AND od.DOB_Year = 1960
-- |--Clustered Index
Seek(OBJECT[OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
AS [od]), SEEK[od].[LName]=[@.1] AND ([od].[FName], [od].[DOB_Year]) >=
([@.2], [@.4]) AND ([od].[FName], [od].[DOB_Year]) <= ([@.3], [@.4])),
WHERE[od].[DOB_Ye
Test 3:
-- This one (in the Graphical Execution Plan) displayed an Estimated Row
Count of 357,915. Took about 10 secs to run.
SET SHOWPLAN_TEXT ON
GO
SELECT od.OffenderID
FROM Offender_Details od
WHERE od.LName = 'Smith'
AND od.FName = 'James'
AND od.DOB_Year = 1960
GO
SET SHOWPLAN_TEXT OFF
GO
--SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith' AND
od.FName = 'James' AND od.DOB_Year = 1960
-- |--Clustered Index
Seek(OBJECT[OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
AS [od]), SEEK[od].[LName]=[@.1] AND [od].[FName]=[@.2] AND
[od].[DOB_Year]=[@.3]) ORDERED FORWARD)
The FName and LName columns are VARCHAR(32) NOT NULL. The DOB_Year column
is INT NOT NULL. OffenderID column is a BIGINT. It is the Primary Key, but
is non-clustered. It is not part of the Clustered Index. The table has a
clustered index of (FName, LName, DOB_Year). Table has about 15 million
rows in it currently. I tried the same type of thing with a non-clustered
index columns on a different table and ended up with less inspiring results.
I also tried it on a couple non-indexed columns, and confirmed (for myself
at least) that there was no point to that test
It seems like >= AND <= in place of = on the clustered index columns made a
significant difference for me. It would be cool if the Query Optimizer
would automatically take care of this conversion for me internally in this
situation. I would think SQL Server would recognize the >= AND <= and the =
as being equivalent and handle that when it generated the plan.
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:42602F3B.9612B18B@.toomuchspamalready.nl...
> If the only index on the table is a clustered index, then only the
> columns in this index are relevant. If any of the three column
> (FirstName, LastName, DOB_Year) is not part of this index, then
> rewriting them probably makes no difference.
> When you look at the query plan, make sure you differentiate between
> SEEK parameter and the predicates mentioned in the WHERE clause of the
> SEEK operator. It is the SEEK parameter that primarily determines the
> performance, because that determines which rows are read.
> I must say, I am starting to get very curious to see the actual
> (estimated) query plans of both queries. If you run "SET SHOWPLAN_TEXT
> ON" before you run the query, then SQL-Server will only generate a query
> plan. Please post both query plans. Maybe there is a data type mismatch
> somewhere, maybe there is a bug (or flaw) that you have uncovered, maybe
> we have overlooked something.
> Gert-Jan
>
> Michael C# wrote:
> <snip>
|||BTW, in the end, this particular query returns just one record... although
there are situations where I'll be pulling back as many as 500+ records with
a search like this. Thanks.
"Michael C#" <howsa@.boutdat.com> wrote in message
news:OYBUAVgQFHA.1392@.TK2MSFTNGP10.phx.gbl...
> OK, they just got the network up down there and I was able to run a few
> tests. Here are the results for you:
> Test 1:
> -- This one (in the Graphical Execution Plan) displayed an Estimated Row
> Count of 7,328. Ran in less than 1 sec.
> SET SHOWPLAN_TEXT ON
> GO
> SELECT od.OffenderID
> FROM Offender_Details od
> WHERE od.LName = 'Smith'
> AND od.FName = 'James'
> AND od.DOB_Year >= 1960
> AND od.DOB_Year <= 1960
> GO
> SET SHOWPLAN_TEXT OFF
> GO
> -- ShowPlan Results:
> --SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith'
> AND od.FName = 'James' AND od.DOB_Year >= 1960 AND od.DOB_Year <= 1960
> -- |--Clustered Index
> Seek(OBJECT[OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
> AS [od]), SEEK[od].[LName]=[@.1] AND [od].[FName]=[@.2] AND
> [od].[DOB_Year]
>
> Test 2:
> -- This one (in the Graphical Execution Plan) displayed an Estimated Row
> Count of 7,678. Ran in less than 1 sec.
> SET SHOWPLAN_TEXT ON
> GO
> SELECT od.OffenderID
> FROM Offender_Details od
> WHERE od.LName = 'Smith'
> AND od.FName >= 'James'
> AND od.FName <= 'James'
> AND od.DOB_Year = 1960
> GO
> SET SHOWPLAN_TEXT OFF
> GO
> --ShowPlan Results:
> --SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith'
> AND od.FName >= 'James' AND od.FName <= 'James' AND od.DOB_Year = 1960
> -- |--Clustered Index
> Seek(OBJECT[OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
> AS [od]), SEEK[od].[LName]=[@.1] AND ([od].[FName], [od].[DOB_Year]) >=
> ([@.2], [@.4]) AND ([od].[FName], [od].[DOB_Year]) <= ([@.3], [@.4])),
> WHERE[od].[DOB_Ye
> Test 3:
> -- This one (in the Graphical Execution Plan) displayed an Estimated Row
> Count of 357,915. Took about 10 secs to run.
> SET SHOWPLAN_TEXT ON
> GO
> SELECT od.OffenderID
> FROM Offender_Details od
> WHERE od.LName = 'Smith'
> AND od.FName = 'James'
> AND od.DOB_Year = 1960
> GO
> SET SHOWPLAN_TEXT OFF
> GO
> --SELECT od.OffenderID FROM Offender_Details od WHERE od.LName = 'Smith'
> AND od.FName = 'James' AND od.DOB_Year = 1960
> -- |--Clustered Index
> Seek(OBJECT[OffenderData].[dbo].[Offender_Details].[IX_Offender_Details]
> AS [od]), SEEK[od].[LName]=[@.1] AND [od].[FName]=[@.2] AND
> [od].[DOB_Year]=[@.3]) ORDERED FORWARD)
> The FName and LName columns are VARCHAR(32) NOT NULL. The DOB_Year column
> is INT NOT NULL. OffenderID column is a BIGINT. It is the Primary Key,
> but is non-clustered. It is not part of the Clustered Index. The table
> has a clustered index of (FName, LName, DOB_Year). Table has about 15
> million rows in it currently. I tried the same type of thing with a
> non-clustered index columns on a different table and ended up with less
> inspiring results. I also tried it on a couple non-indexed columns, and
> confirmed (for myself at least) that there was no point to that test
> It seems like >= AND <= in place of = on the clustered index columns made
> a significant difference for me. It would be cool if the Query Optimizer
> would automatically take care of this conversion for me internally in this
> situation. I would think SQL Server would recognize the >= AND <= and the
> = as being equivalent and handle that when it generated the plan.
>
> "Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
> news:42602F3B.9612B18B@.toomuchspamalready.nl...
>

No comments:

Post a Comment