Thursday, October 13, 2011

how to ensure ranking in sql

Whilst using mysql, I was inspired by our faculty to figure out how to assign ranks to the entries. A little bit of internet browsing, a lot of head scratching, and some innovatively redundant use of table manipulation later, I succeeded.

Point to be noted, we MUST not use the 'rank' structure of transact-sql. Instead we use and manipulate simple sql queries.

Here is the tutorial, please contact me at nardz07@gmail.com





Suppose we have a simple table:


This table emp has ten records as shown above that need to be sorted.

Firstly, if we do not use the transact-sql rank over() command, that is available only in sql server, we must manipulate the table using simpler commands.

Let us first see the simplest way to go about it.

Simply enter the following query to get rank:

select a.eid, a.ename,a.esalary,count(b.esalary) rank from emp a, emp b where a.esalary <= b.esalary or (a.esalary = b.esalary and a.ename = b.ename) group by a.ename,a.esalary order by a.esalary desc, a.ename desc;

This will give us the result:


Before I explain the code, let us clear the requirements. The ranking must be such that the highest paid employee gets the most significant (numerically the smallest) rank.

No two employees must share the same rank unless they draw equal salary, and then it must be ensured their rank placing affects those drawing lesser salary than them.

That is, say if two people are tied after the second place, then either both must be placed third/fourth and the person coming in next must be 5th place only.

Now, let us study the coding.

To display rank in SQL, the idea is to do a self-join, list out the results in order, and do a count on the number of records that's listed ahead of (and including) the record of interest.

Let's focus on the WHERE clause. The first part of the clause, (a.esalary <= b.esalary), makes sure we are only counting the number of occurrences where the value in the Sales column is less than or equal to itself. If there are no duplicate values in the Sales column, this portion of the WHERE clause by itself would be sufficient to generate the correct ranking.

The second part of the clause, (a.esalary=b.esalary and a.ename = b.ename), ensures that when there are duplicate values in the Sales column, each one would get the correct rank.

What must be noted here, is that we have completely avoided using any kind of join in the above code, even though we technically are following the self join logic.

More detailed explanation and analysis into this later. Please contact me with your suggestions and doubts. Till next time, mate!

No comments:

Post a Comment