Wednesday, October 10, 2007

SQL Magic Part 1 - Select Distinct Tricks

Select Distinct is possibly one of the most useful SQL tools but one of the most flawed. Select Distinct will give you only 1 column of data. Why is that? Shouldn't it be able to do a distinct selection on a target column and also return the other row's columns? One would think so but this is not the case.
If you need to grab all columns out of a table while doing a select distinct you can try something like this:
SELECT * FROM DUPES AS D3 WHERE 1= (SELECT COUNT (*) FROM DUPES AS D2 WHERE D3.ID=D2.ID AND D3.NAME <> D2.NAME)
This gives you all the records (and all columns) where there is only 1 instance of an item. You can adjust the 1 = ( Select to a 2>= to get all items that have 2 or or fewer instances of the data you are looking for.
You can do a multi-column distinct query using a sub-select for example:
SELECT DISTINCT (ID) FROM DUPES AS D3 WHERE 1= (SELECT COUNT (*) FROM DUPES AS D2 WHERE D3.ID=D2.ID AND D3.NAME = D2.NAME AND d3.MEMBER = d4.MEMBER)
Of course we could do even more fun stuff given another level of sub-select query but SQL will not go more than 1 level in sub-selects. If we could do a two level sub select we could return all columns for a given distinct query by appending a select * from dupes where 1= (select distinct.... (select count(*)...))
Granted we can write stored procedures to do some of this stuff programmatically but deficiencies in the Distinct function are significant and cost a great deal of time in work-arounds. If new SQL standards come out any time soon they should include a Distinct function that allows for a multi column distinct specification AND wildcards.... ex: SELECT *,DISTINCT (ID, MEMBER_KEY) FROM... or perhaps better... SELECT * FROM DUPES HAVING DISTINCT(ID,MEMBER_KEY)