Monday, January 4, 2010

Returning Nulls in the Resulting Set of a Subquery

The SQL statement on the slide attempts to display all the employees who do not have any subordinates.

Logically, this SQL statement should have returned 12 rows. However, the SQL statement does not return any rows. One of the values returned by the inner query is a null value, and hence the entire query returns no rows. The reason is that all conditions that compare a null value result in a null. So whenever null values are likely to be part of the results set of a subquery, do not use the NOT IN operator. The NOT IN operator is equivalent to <> ALL.

Notice that the null value as part of the results set of a subquery is not a problem if you use the IN
operator. The IN operator is equivalent to =ANY. For example, to display the employees who have subordinates, use the following SQL statement:

SELECT emp.last_name FROM employees emp WHERE emp.employee_id IN

(SELECT mgr.manager_id
FROM employees mgr);

Alternatively, a WHERE clause can be included in the subquery to display all employees who do not have any subordinates:

SELECT last_name FROM employees

WHERE employee_id NOT IN
(SELECT manager_id
FROM employees
WHERE manager_id IS NOT NULL);

Illegal Queries Using Group Functions

The WHERE clause cannot be used to restrict groups. The SELECT statement on the slide results in an error because it uses the WHERE clause to restrict the display of average salaries of those departments that have an average salary greater than $8,000.

You can correct the slide error by using the HAVING clause to restrict groups.


SELECT department_id, AVG(salary)
FROM employees
HAVING AVG(salary) > 8000
GROUP BY department_id;

Creating Groups of Data: The GROUP BY Clause Syntax

The GROUP BY Clause
You can use the GROUP BY clause to divide the rows in a table into groups. You can then use the group functions to return summary information for each group.
In the syntax:

group_by_expression specifies columns whose values determine the basis for
grouping rows Guidelines

• If you include a group function in a SELECT clause, you cannot select individual results as well, unless the individual column appears in the GROUP BY clause. You receive an error message if you fail to include the column list in the GROUP BY clause.
• Using a WHERE clause, you can exclude rows before dividing them into groups.
• You must include the columns in the GROUP BY clause.
• You cannot use a column alias in the GROUP BY clause.
• By default, rows are sorted by ascending order of the columns included in the GROUP BY list.

You can override this by using the ORDER BY clause.

Using the DISTINCT Keyword

Use the DISTINCT keyword to suppress the counting of any duplicate values within a column.

The example on the slide displays the number of distinct department values in the EMPLOYEES table

All group functions ignore null values in the column. In the slide example, the average is calculated based only on the rows in the table where a valid value is stored in the COMMISSION_PCT column.

The average is calculated as the total commission paid to all employees divided by the number of employees receiving a commission (four).

Using the AVG and SUM Functions

Group Functions

You can use AVG, SUM, MIN, and MAX functions against columns that can store numeric data. The example on the slide displays the average, highest, lowest, and sum of monthly salaries for all sales representatives.

Creating Three-Way Joins with the ON ClauseCreating Three-Way Joins with the ON Clause

A three-way join is a join of three tables. In SQL: 1999 compliant syntax, joins are performed from left to right so the first join to be performed is EMPLOYEES JOIN DEPARTMENTS. The first join condition can reference columns in EMPLOYEES and DEPARTMENTS but cannot reference columnsin LOCATIONS. The second join condition can reference columns from all three tables.


This can also be written as a three-way equijoin:

SELECT employee_id, city, department_name
FROM employees, departments, locations
WHERE employees.department_id = departments.department_id
AND departments.location_id = locations.location_id;


SELECT e.employee_id, l.city, d.department_name
FROM employees e JOIN departments d USING (department_id)
JOIN locations l USING (location_id);

Creating Joins with the ON Clause

• The join condition for the natural join is basically an equijoin of all columns with the same name.
• To specify arbitrary conditions or specify columns to join, the ON clause is used.
• The join condition is separated from other search conditions.
• The ON clause makes code easy to understand.

The ON Condition

Use the ON clause to specify a join condition. This lets you specify join conditions separate from any search or filter conditions in the WHERE clause.

Retrieving Records with the USING Clause

The example shown joins the DEPARTMENT_ID column in the EMPLOYEES and DEPARTMENTS tables, and thus shows the location where an employee works.

This can also be written as an equijoin:


SELECT employee_id, last_name,
employees.department_id, location_id
FROM employees, departments
WHERE employees.department_id = departments.department_id;

Creating Joins with the USING Clause

• If several columns have the same names but the data types do not match, the NATURAL JOIN
clause can be modified with the USING clause to specify the columns that should be used for an equijoin.
• Use the USING clause to match only one column when more than one column matches.
• Do not use a table name or alias in the referenced columns.
• The NATURAL JOIN and USING clauses are mutually exclusive.

Natural joins use all columns with matching names and data types to join the tables. The USING clause can be used to specify only those columns that should be used for an equijoin. The columns referenced in the USING clause should not have a qualifier (table name or alias) anywhere in the SQL statement.

For example, this statement is valid:

SELECT l.city, d.department_name
FROM locations l JOIN departments d USING (location_id)
WHERE location_id = 1400;


This statement is invalid because the LOCATION_ID is qualified in the WHERE clause:

SELECT l.city, d.department_name
FROM locations l JOIN departments d USING (location_id)
WHERE d.location_id = 1400;

ORA-25154: column part of USING clause cannot have qualifier


The same restriction applies to NATURAL joins also. Therefore columns that have the same name in both tables have to be used without any qualifiers.

Retrieving Records with Natural Joins

In the example on the slide, the LOCATIONS table is joined to the DEPARTMENT table by the LOCATION_ID column, which is the only column of the same name in both tables. If other common columns were present, the join would have used them all.

Equijoins

The natural join can also be written as an equijoin:


SELECT department_id, department_name,
departments.location_id, city
FROM departments, locations
WHERE departments.location_id = locations.location_id;

Natural Joins with a WHERE Clause
Additional restrictions on a natural join are implemented by using a WHERE clause. The example below limits the rows of output to those with a department ID equal to 20 or 50.

SELECT department_id, department_name,

location_id, city
FROM departments
NATURAL JOIN locations
WHERE department_id IN (20, 50);

Creating Natural JoinsCreating Natural Joins


• The NATURAL JOIN clause is based on all columns in the two tables that have the same name.


• It selects rows from the two tables that have equal values in all matched columns.

• If the columns having the same names have different data types, an error is returned.

Joining a Table to Itself

Sometimes you need to join a table to itself. To find the name of each employee’s manager, you need to join the EMPLOYEES table to itself, or perform a self join. For example, to find the name of Whalen’s manager, you need to:

• Find Whalen in the EMPLOYEES table by looking at the LAST_NAME column.

• Find the manager number for Whalen by looking at the MANAGER_ID column. Whalen’s manager number is 101.

• Find the name of the manager with EMPLOYEE_ID 101 by looking at the LAST_NAME column. Kochhar’s employee number is 101, so Kochhar is Whalen’s manager.

In this process, you look in the table twice. The first time you look in the table to find Whalen in the LAST_NAME column and MANAGER_ID value of 101. The second time you look in the EMPLOYEE_ID column to find 101 and the LAST_NAME column to find Kochhar.

Returning Records with No Direct Match with Outer Joins

If a row does not satisfy a join condition, the row will not appear in the query result. For example, in the equijoin condition of EMPLOYEES and DEPARTMENTS tables, employee Grant does not appear because there is no department ID recorded for her in the EMPLOYEES table. Instead of seeing 20 employees in the result set, you see 19 records.


SELECT e.last_name, e.department_id, d.department_name FROM employees e, departments d WHERE e.department_id = d.department_id;

The DECODE Function

The DECODE function decodes an expression in a way similar to the IF-THEN-ELSE logic used in various languages. The DECODE function decodes expression after comparing it to each search value. If the expression is the same as search, result is returned.

If the default value is omitted, a null value is returned where a search value does not match any of the result values.


In the preceding SQL statement, the value of JOB_ID is tested. If JOB_ID is IT_PROG, the salary increase is 10%; if JOB_ID is ST_CLERK, the salary increase is 15%; if JOB_ID is SA_REP, the salary increase is 20%. For all other job roles, there is no increase in salary.

The same statement can be expressed in pseudocode as an IF-THEN-ELSE statement:

IF job_id = ’IT_PROG’ THEN salary = salary*1.10
IF job_id = ’ST_CLERK’ THEN salary = salary*1.15
IF job_id = ’SA_REP’ THEN salary = salary*1.20
ELSE salary = salary

Using the COALESCE Function

• The advantage of the COALESCE function over the NVL function is that the COALESCE function can take multiple alternate values.

• If the first expression is not null, it returns that expression; otherwise, it does a COALESCE of the remaining expressions.


The COALESCE Function.

The COALESCE function returns the first non-null expression in the list.

Syntax

COALESCE (expr1, expr2, ... exprn)


In the syntax:


expr1 returns this expression if it is not null
expr2 returns this expression if the first expression is null and this expression is not null exprn returns this expression if the preceding expressions are null

The TO_NUMBER and TO_DATE Functions (continued)

Display the names and hire dates of all the employees who joined on May 24, 1999. Because the fx modifier is used, an exact match is required and the spaces after the word ‘May’ are not recognized.

SELECT last_name, hire_date FROM employees
WHERE hire_date = TO_DATE(’May 24, 1999’, ’fxMonth DD, YYYY’);

Sorting by Multiple Columns

You can sort query results by more than one column. The sort limit is the number of columns in the given table.

In the ORDER BY clause, specify the columns, and separate the column names using commas. If you want to reverse the order of a column, specify DESC after its name. You can also order by columns that are not included in the SELECT clause.

Example

Display the last names and salaries of all employees. Order the result by department number, and then in descending order by salary.

SELECT last_name, salary FROM employees ORDER BY department_id, salary DESC;

Example of the Precedence of the AND Operator

In the slide example, there are two conditions:

• The first condition is that the job ID is AD_PRES and the salary is greater than 15,000.


• The second condition is that the job ID is SA_REP.

Therefore, the SELECT statement reads as follows:

“Select the row if an employee is a president and earns more than $15,000, or if the employee is a sales representative.”

Logical Conditions in SQL

A logical condition combines the result of two component conditions to produce a single result based on them or inverts the result of a single condition. A row is returned only if the overall result of the condition is true. Three logical operators are available in SQL:

• AND
• OR
• NOT

All the examples so far have specified only one condition in the WHERE clause. You can use several conditions in one WHERE clause using the AND and OR operators.

Using the LIKE Condition in SQL

• Use the LIKE condition to perform wild card searches of valid search string values.
• Search conditions can contain either literal characters or numbers:


– % denotes zero or many characters.
– denotes one character.

SELECT first_name
FROM employees
WHERE first_name LIKE ’S%’;
What is RDBMS?

In recent years, database management systems (DBMS) have established themselves as the primary means of data storage for information systems ranging from large commercial transaction processing applications to PC-based desktop applications. At the heart of most of today's information systems is a relational database management system (RDBMS). RDBMSs have been the workhorse for data management operations for over a decade and continue to evolve and mature, providing sophisticated storage, retrieval, and distribution functions to enterprise-wide data processing and information management systems. Compared to the file systems, relational database management systems provide organizations with the capability to easily integrate and leverage the massive amounts of operational data into meaningful information systems. The evolution of high-powered database engines such as Oracle7 has fostered the development of advanced "enabling" technologies including client/server, data warehousing, and online analytical processing, all of which comprise the core of today's state-of-the-art information management systems.

Examine the components of the term relational database management system. First, a database is an integrated collection of related data. Given a specific data item, the structure of a database facilitates the access to data related to it, such as a student and all of his registered courses or an employee and his dependents. Next, a relational database is a type of database based in the relational model; non-relational databases commonly use a hierarchical, network, or object-oriented model as their basis. Finally, a relational database management system is the software that manages a relational database. These systems come in several varieties, ranging from single-user desktop systems to full-featured, global, enterprise-wide systems, such as Oracle7.

This blog discusses the basic elements of a relational database management system, the relational database, and the software systems that manage it. Also included is a discussion of nonprocedural data access. If you are a new user to relational database technology, you'll have to change your thinking somewhat when it comes to referencing data nonprocedurally.

The Relational Database Model

Most of the database management systems used by commercial applications today are based on one of three basic models: the hierarchical model, the network model, or the relational model. The following sections describe the various differences and similarities of the models.

Hierarchical and Network Models


The first commercially available database management systems were of the CODASYL type, and many of them are still in use with mainframe-based, COBOL applications. Both network and hierarchical databases are quite complex in that they rely on the use of permanent internal pointers to relate records to each other. For example, in an accounts payable application, a vendor record might contain a physical pointer in its record structure that points to purchase order records. Each purchase order record in turn contains pointers to purchase order line item records.

The process of inserting, updating, and deleting records using these types of databases requires synchronization of the pointers, a task that must be performed by the application. As you might imagine, this pointer maintenance requires a significant amount of application code (usually written in COBOL) that at times can be quite cumbersome.

Elements of the Relational Model


Relational databases rely on the actual attribute values as opposed to internal pointers to link records. Instead of using an internal pointer from the vendor record to purchase order records, you would link the purchase order record to the vendor record using a common attribute from each record, such as the vendor identification number.

Although the concepts of academic theory underlying the relational model are somewhat complex, you should be familiar with are some basic concepts and terminology. Essentially, there are three basic components of the relational model: relational data structures, constraints that govern the organization of the data structures, and operations that are performed on the data structures.

Relational Data Structures

The relational model supports a single, "logical" structure called a relation, a two-dimensional data structure commonly called a table in the "physical" database. Attributes represent the atomic data elements that are related by the relation. For example, the Customer relation might contain such attributes about a customer as the customer number, customer name, region, credit status, and so on.


Key Values and Referential Integrity


Attributes are grouped with other attributes based on their dependency on a primary key value. A primary key is an attribute or group of attributes that uniquely identifies a row in a table. A table has only one primary key, and as a rule, every table has one. Because primary key values are used as identifiers, they cannot be null. Using the conventional notation for relations, an attribute is underlined to indicate that it is the primary key of the relation. If a primary key consists of several attributes, each attribute is underlined.

You can have additional attributes in a relation with values that you define as unique to the relation. Unlike primary keys, unique keys can contain null values. In practice, unique keys are used to prevent duplication in the table rather than identify rows. Consider a relation that contains the attribute, United States Social Security Number (SSN). In some rows, this attribute may be null in since not every person has a SSN; however for a row that contains a non-null value for the SSN attribute, the value must be unique to the relation.

Linking one relation to another typically involves an attribute that is common to both relations. The common attributes are usually a primary key from one table and a foreign key from the other. Referential integrity rules dictate that foreign key values in one relation reference the primary key values in another relation. Foreign keys might also reference the primary key of the same relation. Figure illustrates two foreign key relationships.



Oracle and Client/Server


Oracle Corporation's reputation as a database company is firmly established in its full-featured, high-performance RDBMS server. With the database as the cornerstone of its product line, Oracle has evolved into more than just a database company, complementing its RDBMS server with a rich offering of well-integrated products that are designed specifically for distributed processing and client/server applications. As Oracle's database server has evolved to support large-scale enterprise systems for transaction processing and decision support, so too have its other products, to the extent that Oracle can provide a complete solution for client/server application development and deployment. This chapter presents an overview of client/server database systems and the Oracle product architectures that support their implementation.

An Overview of Client/Server Computing

The premise of client/server computing is to distribute the execution of a task among multiple processors in a network. Each processor is dedicated to a specific, focused set of subtasks that it performs best, and the end result is increased overall efficiency and effectiveness of the system as a whole. Splitting the execution of tasks between processors is done through a protocol of service requests; one processor, the client, requests a service from another processor, the server. The most prevalent implementation of client/server processing involves separating the user interface portion of an application from the data access portion.

On the client, or front end, of the typical client/server configuration is a user workstation operating with a Graphical User Interface (GUI) platform, usually Microsoft Windows, Macintosh, or Motif. At the back end of the configuration is a database server, often managed by a UNIX, Netware, Windows NT, or VMS operating system.

Client/server architecture also takes the form of a server-to-server configuration. In this arrangement, one server plays the role of a client, requesting database services from another server. Multiple database servers can look like a single logical database, providing transparent access to data that is spread around the network.

Designing an efficient client/server application is somewhat of a balancing act, the goal of which is to evenly distribute execution of tasks among processors while making optimal use of available resources. Given the increased complexity and processing power required to manage a graphical user interface (GUI) and the increased demands for throughput on database servers and networks, achieving the proper distribution of tasks is challenging. Client/server systems are inherently more difficult to develop and manage than traditional host-based application systems because of the following challenges:

The components of a client/server system are distributed across more varied types of processors. There are many more software components that manage client, network, and server functions, as well as an array of infrastructure layers, all of which must be in place and configured to be compatible with each other.

The complexity of GUI applications far outweighs that of their character-based predecessors. GUIs are capable of presenting much more information to the user and providing many additional navigation paths to elements of the interface.

Troubleshooting performance problems and errors is more difficult because of the increased number of components and layers in the system.

Databases in a Client/Server Architecture

Client/server technologies have changed the look and architecture of application systems in two ways. Not only has the supporting hardware architecture undergone substantial changes, but there have also been significant changes in the approach to designing the application logic of the system.

Prior to the advent of client/server technology, most Oracle applications ran on a single node. Typically, a character-based SQL*Forms application would access a database instance on the same machine with the application and the RDBMS competing for the same CPU and memory resources. Not only was the system responsible for supporting all the database processing, but it was also responsible for executing the application logic. In addition, the system was burdened with all the I/O processing for each terminal on the system; each keystroke and display attribute was controlled by the same processor that processed database requests and application logic.

Client/server systems change this architecture considerably by splitting all of the interface management and much of the application processing from the host system processor and distributing it to the client processor.

Combined with the advances in hardware infrastructure, the increased capabilities of RDBMS servers have also contributed to changes in the application architecture. Prior to the release of Oracle7, Oracle's RDBMS was less sophisticated in its capability to support the processing logic necessary to maintain the integrity of data in the database. For example, primary and foreign key checking and enforcement was performed by the application. As a result, the database was highly reliant on application code for enforcement of business rules and integrity, making application code bulkier and more complex. Figure 2.1 illustrates the differences between traditional host-based applications and client/server applications. Client/server database applications can take advantage of the Oracle7 server features for implementation of some of the application logic.