SQL: Duplicate Check with CTE
Author: jason
Date: 2014-01-28 11:32:33
Category: Musing

I am still using the ROW_NUMBER() OVER PARTITION BY Window Function, but I've added a Common Table Expression (CTE).

The Over Clause: http://technet.microsoft.com/en-us/library/ms189461.aspx

Row Number is assigned on the partition results which is kind of a "group by" for the value specified. The Row Number starts over at 1 for each unique value. If a value is listed 3 times then the Rows will be 1, 2, 3. Pretty cool for looking for duplicate entries.





If we add a WHERE clause to the end of the CTE SELECT we can get out the specific row we are interested in.




Stepping through the code

Line 1 is the Temporary Table to hold values. The 1st column is an [ID] that is a BIGINT with the IDENTITY property that starts with 101 and auto increments by 1. The 2nd column is a [NAME] that is a VARCHAR that is limited to a length of 10.




Line 2 is the INSERT to populate the Temporary Table with Fruit Names. Note that I am putting Apple in twice.




Line 4 is the start (or kind of declaration) for the Common Table Expression (CTE). You don't have to use CTE as an Alias. Most any characters will be fine. For Instance WITH tt AS ( would work too.




Lines 5-9 handle the SELECT out of the Temporary Table.
Lines 6-8 are the Window Function for calculating the ROW_NUMBER. You aren't limited to partitioning by only 1 column, more can be added to determine the count/grouping. i.e, name, date
Line 7 is the ORDER of the ROW_NUMBER and can be adjusted with ASC or DESC.
Line 8 is the Alias of "row/column" for the Window Function.
Line 9 is getting everything out of the Temporary Table.




Line 10 ends the CTE, and then does a SELECT from it.




Lines 4 & 10 (the CTE) are not at all necessary for determining ROW_NUMBER or duplication. That is handled by the Window Function. But using the CTE lets us put a WHERE clause onto the results to just get out the Row Numbers greater than 1, i.e. the duplicate values.




Code

DECLARE @Temp AS TABLE ([id] BIGINT IDENTITY(101,1), [name] VARCHAR(10));
INSERT INTO @Temp VALUES ('Apple'), ('Apple'), ('Peach'), ('Pear');

WITH CTE AS (
SELECT t.id, t.name, ROW_NUMBER() OVER (PARTITION BY t.name ORDER BY t.id ASC) AS Row_Num
FROM @Temp t
) SELECT * FROM CTE WHERE Row_Num > 1;




jason @ jasonthomasfrance.com - www.masterstationlog.com - copyright 2009