I've got two tables with email addresses. I'm trying to return the result set from one when the email address is not contained in the other. Here's my select statement
SELECT * FROM ITSJ_Pivot WHERE Email not IN (SELECT email FROM lists_master)
Returns 0 rows
Here are the results from the individual select statements for each table:
select NUSOID from lists_master where email ='jon@.lackeydesign.com'
Returns 0 rows
select NUOSID from itsj_pivot where email ='jon@.lackeydesign.com'
1-8174
How can this be? The email is in the ITSJ_Pivot table and not in Lists_Master but the "not in" statement does not return the rows. Please help. Thanks.
Most likely this query:
SELECT email FROM lists_master
Will return a null value for one of the rows.
If so, then IN will always evaluate to either TRUE (it is in there) or UNKNOWN. So either it is NOT(TRUE) or it is NOT (UNKNOWN) which results in FALSE or UNKNOWN respectively:
create table lists_master
( --demo only, should have PKEY
email varchar(200) NULL
)
insert into lists_master
values (NULL)
insert into lists_master
values ('email@.address.com')
go
create table itsj_pivot
( --demo only, should have PKEY
email varchar(200) NOT NULL
)
insert into itsj_pivot
values ('email@.address.com')
insert into itsj_pivot
values ('anotheremail@.address.com')
go
SELECT * FROM ITSJ_Pivot WHERE Email not IN (SELECT email FROM lists_master)
No rows
SELECT * FROM ITSJ_Pivot WHERE Email not IN (SELECT email FROM lists_master where email is not null)
1 row:
email
-
anotheremail@.address.com
The best way to write this is:
SELECT *
FROM ITSJ_Pivot
WHERE not exists (SELECT email
FROM lists_master
WHERE lists_master.email = ITSJ_Pivot.email)
In this case the NULL isn't a problem because EXISTS looks for the existance of a joined row, so the NULL condition is handled down in the WHERE of the subquery rather than in the comparison operator IN (working with NOT() logic and NULL comparisons can be really annoying.
|||Better to do a not exists. For one, it is generally quicker as you are creating a join between the two tables. For further explanation, see my blog below:|||Both NOT IN and NOT EXISTS are flattened to joins. In fact, in lot of cases you will get the same execution plan. It is just that the semantics for the constructs are different and how the checks are performed. Using EXISTS for existence/non-existence check is always the best approach.|||SELECT p.* FROM ITSJ_Pivot p
LEFT JOIN lists_master m ON p.email = m.email
WHERE m.NUSOID IS NULL
I prefer the join syntax. For me it is more clear.
|||That worked. Thanks for the help.|||The problem with the join syntax is a couple of things:
1. You have both columns in the output:
SELECT * FROM ITSJ_Pivot p
LEFT JOIN lists_master m ON p.email = m.email
WHERE m.NUSOID IS NULL
Change to select p.* (or expand the columns, which is best practice) and this isn't a concernt.
2. What about the opposite question, show me all of the ITSJ_Pivot rows that do have an email address? Now you have to include distinct to get unique results:
SELECT DISTINCT p.* FROM ITSJ_Pivot p
LEFT JOIN lists_master m ON p.email = m.email
WHERE m.NUSOID IS NOT NULL
You can use a join, but what if you don't need to data in the lists_master table in the results? This can be more costly to execute (especially if indexes cannot be chosen to make the operation easy to execute)
3. The most important thing to me is that the JOIN syntax doesn't ask the question in a clear manner. IN is the clearest:
Give me ITSJ_Pivot rows where the email address is in the lists_master table
though it only works when you have no nulls and you are working with one column. EXISTS expresses what you are looking for:
Give me ITSJ_Pivot rows where there exists a email address match in the lists_master table.
This is even more important when you are not just working with two tables. The exists condition works if you have a 30 table join or just 2. The join introduces extraneous data or NULLs that can be confusing.
No comments:
Post a Comment