Monday, May 6, 2013

Multiple conditions in ON clause when using LEFT JOIN

We are familiar with SQL joins with only one join condition, the foreign key in one table equal to the referenced column in another table. In fact, we can do more than that by specifying extra conditions. To demonstrate that, we're gonna use two tables that will be joined later, the orders and order_items, as the order_id in order_items table references the id in order table.

orders

order_items








We've already got bunch of records in both tables.
First, let's specify one condition in the ON clause, the join column, and show the result.

SELECT * FROM orders AS o LEFT JOIN order_items AS i ON o.id = i.order_id





Fig.1



Now let's add one more condition, the name of the order_items must be ''Toast", and we got this.

SELECT * FROM orders AS o LEFT JOIN order_items AS i ON o.id = i.order_id AND i.name = "Toast"




Fig.2


Comparing Fig.1 and Fig.2, we can see that the addition of the second condition in ON clause disqualifies any records in order_items table whose name is not "Toast" as the join candidates. So the first order record ended up with no order items attached to it, thus having NULL value in each column of order item.

It's easy to think that (at least for me for the first time) moving the second condition in ON clause to WHERE clause would have the same effect.

SELECT * FROM orders AS o LEFT JOIN order_items AS i ON o.id = i.order_id WHERE i.name = "Toast"

After we run the SQL command above, we got this.





Obviously, they are different.

The essential difference between these two commands is that the first one eliminate any order item records that do not meet the condition BEFORE join, while all the condition in WHERE clause remove any records that do not meet the condition AFTER the join is done.


Monday, May 7, 2012

How to select the row with the max/min number in each group after GROUP BY using Oracle



The scenario is described as below:

Assume an insurance company maintains two tables: Service and Policy. The Service table has two attributes -  serviceID and name. The Policy table has three attributes - policyID, servideID, customerName and year. Now let's create these two tables.

CREATE TABLE Service(
serviceID CHAR(2) PRIMARY KEY,
name VARCHAR2(50) NOT NULL
);
CREATE TABLE Policy(
policyID CHAR(2) PRIMARY KEY,
serviceID CHAR(2) NOT NULL,
customerName VARCHAR2(50) NOT NULL,
year NUMBER(4) NOT NULL,
FOREIGN KEY (serviceID)
REFERENCES Service(serviceID) 
);

Insert rows into Service table, 
INSERT INTO Service VALUES('01', 'Home Insurance');
INSERT INTO Service VALUES('02', 'Auto Insurance');
INSERT INTO Service VALUES('03', 'Personal Insurance');
Insert rows into Customer table,
INSERT INTO Policy VALUES('01', '01', 'David', 2012);
INSERT INTO Policy VALUES('02', '01', 'John', 2012);
INSERT INTO Policy VALUES('03', '01', 'Alex', 2012);
INSERT INTO Policy VALUES('04', '01', 'Bill', 2011);
INSERT INTO Policy VALUES('05', '02', 'William', 2011);
INSERT INTO Policy VALUES('06', '02', 'Jack', 2011);
INSERT INTO Policy VALUES('07', '02', 'Steve', 2011);
INSERT INTO Policy VALUES('08', '02', 'Joe', 2010);
INSERT INTO Policy VALUES('09', '03', 'Jane', 2009);
INSERT INTO Policy VALUES('10', '03', 'Sam', 2009);
INSERT INTO Policy VALUES('11', '03', 'Jim', 2009);
INSERT INTO Policy VALUES('12', '03', 'Tim', 2010);

Now let's start to write the query that returns, for each Service department, the year in which the largest number of policy signed. You need to return the serviceID, the year and the number of the policies for that service and for that year. 

The first thing that come to our mind is to write query like below. 
------------Q1----------------
SELECT serviceID, year, COUNT(*) AS Num_Of_Policy 
FROM Policy
GROUP BY serviceID, year
ORDER BY serviceID, year;
This query returns T1:
                        

The next thing we need to do is to select the year with the largest number of policies based on the first query. 
------------Q2----------------
SELECT serviceID, MAX(Num_Of_Policy) AS Max_Num_Of_Policy
FROM 
(SELECT serviceID, year, COUNT(*) AS Num_Of_Policy 
FROM Policy
GROUP BY serviceID, year
ORDER BY serviceID, year)
GROUP BY serviceID;
This query will return T2.

It's tempting to write the following two queries, but neither of them works. 
1) 
SELECT serviceID, year, MAX(Num_Of_Policy) AS Max_Num_Of_Policy
FROM 
(SELECT serviceID, year, COUNT(*) AS Num_Of_Policy 
FROM Policy
GROUP BY serviceID, year
ORDER BY serviceID, year)
GROUP BY serviceID, year
ORDER BY serviceID, year;
, which returns T1
2) 
SELECT serviceID, year, MAX(Num_Of_Policy) AS Max_Num_Of_Policy
FROM 
(SELECT serviceID, year, COUNT(*) AS Num_Of_Policy 
FROM Policy
GROUP BY serviceID, year
ORDER BY serviceID, year)
GROUP BY serviceID;
, which is not in correct syntax.

We are not finished yet at this point, because we still need one more field, the year. We achieve this by joining two T1 and T2. Note the joining field are serviceID and Num_Of_Policy. 

SELECT T2.serviceID, T1.year, T2. Max_Num_Of_Policy
FROM 
----------------Q1------------------
(SELECT serviceID, year, COUNT(*) AS Num_Of_Policy 
FROM Policy
GROUP BY serviceID, year
ORDER BY serviceID, year) T1, 
----------------Q2------------------
(SELECT serviceID, MAX(Num_Of_Policy) AS Max_Num_Of_Policy
FROM 
(SELECT serviceID, year, COUNT(*) AS Num_Of_Policy 
FROM Policy
GROUP BY serviceID, year
ORDER BY serviceID, year)
GROUP BY serviceID) T2
WHERE T1.serviceID=T2.serviceID
AND T1.Num_Of_Policy=T2.Max_Num_Of_Policy;
The final result is
Conclusion

The tricky thing about this query is that you need to add the field, year,  back to the final result, which is eliminated in Q2. We achieve this by performing Q1 again, and joining the two resulting table. As far as I know, there is no way to achieve this query within two steps. The join between T1 and T2 is necessary. 


"Mutating Table" Exception in Oracle



One common problem a new trigger writer will frequently run into is the "mutating table" exception. This exception is caused when the trigger is trying to query the table that is being changed or is possible to be changed. Here is a very simple example to demonstrate "mutating table" exception. (The queries are in the file "Mutating_Table_Exception.txt")


Assume you have three tables,
Student(ID, name)
Course(ID, name)
Registration(studentID, courseID, grade)
with studentID and courseID referencing table Student and Course. 
The restriction we want our trigger to be built on is that each course has a maximum capacity of 10 students. 
It's common for us to build a trigger like below: 
CREATE OR REPLACE TRIGGER Max_Course_Capacity 
BEFORE INSERT OR UPDATE OF courseID ON Registration 
FOR EACH ROW 
DECLARE 
num_of_student_registered NUMBER; 
BEGIN 
SELECT COUNT(*) INTO num_of_student_registered 
FROM Registration 
WHERE courseID=:NEW.courseID;  
IF num_of_student_registered >=10 THEN 
raise_application_error(-20000, 'Cannot perform this query: The number of students 
registered in a certain course cannot exceed 10 '); 
END IF; 
END; 
/

This trigger works fine with insert operation.

But for update, the "mutating table" exception will be raised. 

I tried two approaches trying to work around "mutating table" exception, and one works and the other does not. They are listed as below. 

Approach 1 (Does not work): 

Split the original trigger into two, one for inserting and one for updating, and change the BEFORE key word to AFTER for the updating trigger. If the newly updated row makes the number of students of one course greater than the capacity, then the trigger will use  a SQL query to change that newly updated row back to its original value. The code are shown below: 

CREATE OR REPLACE TRIGGER Max_Course_Capacity_1 
BEFORE INSERT ON Registration 
FOR EACH ROW 
DECLARE 
num_of_student_registered NUMBER; 
BEGIN 
SELECT COUNT(*) INTO num_of_student_registered 
FROM Registration 
WHERE courseID=:NEW.courseID;  
IF num_of_student_registered >=10 THEN 
raise_application_error(-20000, 'Cannot perform this query: The number of students 
registered in a certain course cannot exceed 10 '); 
END IF; 
END; 
/

CREATE OR REPLACE TRIGGER Max_Course_Capacity_2 
AFTER UPDATE OF courseID ON Registration 
FOR EACH ROW 
DECLARE 
num_of_student_registered NUMBER; 
BEGIN 
SELECT COUNT(*) INTO num_of_student_registered 
FROM Registration 
WHERE courseID=:NEW.courseID;  
IF num_of_student_registered >=10 THEN 
raise_application_error(-20000, 'Cannot perform this query: The number of students 
registered in a certain course cannot exceed 10 '); 
--change the newly updated row back to its original value

UPDATE Registration SET courseID=:OLD.courseID WHERE studentID=:NEW.studentID AND courseID=:NEW.courseID;

END IF; 
END; 
/

Approach 2 (Works)

This is an easy way to make "mutating table" exception disappear. We need to use the PRAGMA EXCEPTION_INIT to bind the exception we defined to a particular error number, -4091 in this case. Below is the trigger body.

CREATE OR REPLACE TRIGGER Max_Course_Capacity 
BEFORE INSERT OR UPDATE OF courseID ON Registration 
FOR EACH ROW 
DECLARE 
num_of_student_registered NUMBER;
mutating_table EXCEPTION; 
PRAGMA EXCEPTION_INIT(mutating_table, -4091);
BEGIN 
SELECT COUNT(*) INTO num_of_student_registered 
FROM Registration 
WHERE courseID=:NEW.courseID;  
IF num_of_student_registered >=10 THEN 
raise_application_error(-20000, 'Cannot perform this query: The number of students 
registered in a certain course cannot exceed 10 '); 
END IF; 

EXCEPTION WHEN mutating_table THEN
raise_application_error(-20000, 'Cannot perform this query: The number of students 
registered in a certain course cannot exceed 10 ');
END; 
/

When a "mutating table" exception is raised, we use the function "raise_application_error" to indicate that this query is illegal. 


Conclusion

This simple example didn't show all the aspects of "mutating table" exception. For example, "mutating table" exception will not occur if only a single record is inserted in the table, but it will occur for multiple row insertion. In addition, "mutating table" exception is not only encountered during queries, but also for insert, updates and deletes present in the trigger. The basic reason for "mutating" exception is the way Oracle manages a read consistent view of data.


The "mutating table" exception is usually the result of a poor application design and mutating triggers should be avoided whenever possible. Thus the best way to work around "mutating table" exception in our example is to add a field to record the number of students enrolled in each course into the Course table. When a insertion of update is fired on table Registration, instead of querying table Registration, the trigger will query table Course, thus avoid "mutating table" exception.