Saturday, November 15, 2008

Data Integrity

Oracle also implements the idea of statement-level rollback. If a single statement fails during a transaction, the entire statement will fail. In other words, an INSERT statement for 1000 rows will insert either all 1000 rows or none at all; the entire statement works, or nothing happens. If a statement does fail within a transaction, the rest of the statements in the transaction are still in a pending state and must be committed or rolled back.

If a user process terminates abnormally (the process is killed, for example), the PMON background process automatically rolls back changes. Any changes that the process had committed up to the point of failure remain committed, and only those changes for the current transaction are rolled back.

All locks held by the transaction are automatically released when the transaction commits or rolls back or when the PMON background process rolls back the transaction. In addition, other system resources (such as rollback segments) are released for other transactions to use.

Savepoints enable you to set up markers within a transaction so that you have the option of rolling back just part of the work performed in the transaction. You can use savepoints in long and complex transactions to provide the reversing option for certain statements. However, this causes extra overhead on the system to perform the work for a statement and then reverse the changes; usually, changes in the logic can produce a more optimal solution. When Oracle performs a rollback to a savepoint, the rest of the statements in the transaction remain in a pending state and must be committed or rolled back. Oracle releases the locks taken by those statements that were rolled back.

Data Integrity

Data integrity is about enforcing data validation rules—such as checking that a percentage amount is between 0 and 100—to ensure that invalid data does not get into your tables. Historically, these rules were enforced by the application programs themselves (and the same rules were checked repeatedly in different programs). Oracle, however, enables you to define and store these rules against the database objects to which they relate so that you need to code them only once so they are enforced whenever any kind of change is made to the table, regardless of which tool issues the insert, update, or delete statement. This checking takes the form of integrity constraints and database triggers.

Integrity Constraints

Integrity constraints enforce business rules at the database level by defining a set of checks for the tables in your system. These checks are automatically enforced whenever you issue an insert, update, or delete statement against the table. If any of the constraints are violated, the insert, update, or delete statement is rolled back. The other statements within the transaction remain in a pending state and can be committed or rolled back according to application logic.

Because integrity constraints are checked at the database level, they are performed regardless of where the insert, update, delete statement originated—whether it was an Oracle or a non-Oracle tool. Defining checks using these constraints is also quicker than performing the same checks using SQL. In addition, the information provided by declaring constraints is used by the Oracle optimizer to make better decisions about how to run a statement against the table. The Oracle Forms product can also use constraints to automatically generate code in the front-end programs to provide an early warning to the user of any errors.

The types of integrity constraints that you can set up on a table are NOT NULL, PRIMARY KEY, UNIQUE, FOREIGN KEY, CHECK, and indexes.

NOT NULL Constraints

You set the NOT NULL constraint against a column to specify that the column must always have a value on every row; it can never be null. By default, all the columns in a table are nullable. For example, using a NOT NULL constraint on an orders table, you can specify that there must always be an order amount.

PRIMARY KEY

The PRIMARY KEY constraint defines a column or a set of columns that you can use to uniquely identify a single row. No two rows in the table can have the same values for the primary key columns. In addition, the columns for a primary key constraint must always have a value—in other words, they are NOT NULL. If you add a constraint to a table after it has been created, any columns that make up the PRIMARY KEY constraint are modified to NOT NULL. Only one PRIMARY KEY constraint can exist for any table. For example, using a PRIMARY KEY constraint on an orders table, you can specify that a table cannot have two records with the same order number.

UNIQUE

The UNIQUE constraint defines a secondary key for the table. This is a column or set of columns that you can use as another way of uniquely identifying a row. No two rows can have the same values for the UNIQUE key column or columns. Although it is not possible for a table to have more than one primary key, a table can have more than one UNIQUE constraint.

The columns for a UNIQUE constraint do not have to be identified as NOT NULL (although they usually are). If the values for any of the columns that form the unique constraint are null, the constraint is not checked. For example, using a PRIMARY KEY and UNIQUE constraint on a customers table, you can specify that the customer number is a primary key and that the customer name is a unique key (which would mean that you could not have two customers with the same name on your table—a rare situation).


FOREIGN KEY

The FOREIGN KEY or referential integrity constraint enforces relationship integrity between tables. It dictates that a column or set of columns on the table match a PRIMARY KEY or UNIQUE constraint on a different table. For example, you could set up a FOREIGN KEY constraint on the orders table to specify that whenever an order record is inserted or updated, the customer number must also exist in the customers table. This ensures that you don't get orders for nonexistent customers.

You use FOREIGN KEY constraints to enforce parent/child relationships between tables. You can even use them to enforce self-referential constraints, usually in situations where a hierarchical structure is set up with all the rows held in the same table. If any of the columns of the foreign key are null, the constraint is not checked at all. Foreign key columns are usually declared as NOT NULL.

It is possible to specify that when the parent row is deleted, the delete should automatically cascade and delete the child rows—a dangerous situation. The user is informed only about the master rows that were removed, and he might not be aware of the additional rows that were deleted automatically in the background because he is not told that this cascading deletion has happened.

Only this automatic deletion of child rows is supported by specifying the ON DELETE CASCADE clause to the end of the foreign key creation statement. If you change the master table's key value, however, the child rows are not updated automatically to reflect the new key; you can implement this update cascade requirement using database triggers.

CHECK

A CHECK constraint specifies additional logic that must be true for the insert, update, or delete statement to work on the table. The additional logic returns a Boolean result, and in the check constraint, you ensure the values in the row being modified satisfy a set of validation checks that you specify. The syntax of a CHECK constraint is very similar to the syntax found in the WHERE clause of a SELECT statement; however, you cannot use subqueries or other columns that vary over time (such as SYSDATE). You can use database triggers to perform this additional processing that you cannot put into constraints. For example, using a CHECK constraint on the orders table, you can specify that the order amount must be greater than zero and the salesman's commission cannot be greater than 10 percent of the order total.

Indexes

PRIMARY KEY and UNIQUE constraints automatically create an index on the columns they're defined against if the constraint is enabled upon creation. If an index already exists on the columns that form the PRIMARY KEY or UNIQUE constraint, that index is used, and Oracle cannot create a new one. Oracle creates indexes when the constraint is enabled (which is the default when the constraint is first added to the table). Oracle drops the indexes from the table when the constraint is disabled. Enabling and disabling constraints can take significant time and system overhead due to the index creation and removal.

When you set up a FOREIGN KEY constraint, the columns are not indexed automatically. Because the foreign key columns are usually involved in joining tables together, you manually create indexes on those columns.

Database Triggers

A database trigger is a PL/SQL block that you can define to automatically execute for insert, update, and delete statements against a table. You can define the trigger to execute once for the entire statement or once for every row that is inserted, updated, or deleted. For any one table, there are twelve events for which you can define database triggers. For each of the twelve events, you can define many database triggers for the same event.

A database trigger can call database procedures that are also written in PL/SQL. Unlike database triggers, procedures on the database are stored in a compiled form. For this reason, you should put the longer code segments into a procedure and then call the procedure from the database trigger.

In addition to implementing complex business rules, checking, and defaulting, you can use database triggers to insert, update, and delete other tables. An example of this use is providing an auditing facility where an audit trail is automatically created in an audit table whenever a row is changed on a table. Without database triggers, this function would be implemented in the front-end programs that make the change to the database; however, someone bypassing the code in the front-end programs (using SQL*Plus, for example) would not go through the checks and processing defined.

Database triggers differ from constraints in that they enable you to embed SQL statements within them, whereas constraints do not.
What is RDBMS?

In recent years, database management systems (DBMS) have established themselves as the primary means of data storage for information systems ranging from large commercial transaction processing applications to PC-based desktop applications. At the heart of most of today's information systems is a relational database management system (RDBMS). RDBMSs have been the workhorse for data management operations for over a decade and continue to evolve and mature, providing sophisticated storage, retrieval, and distribution functions to enterprise-wide data processing and information management systems. Compared to the file systems, relational database management systems provide organizations with the capability to easily integrate and leverage the massive amounts of operational data into meaningful information systems. The evolution of high-powered database engines such as Oracle7 has fostered the development of advanced "enabling" technologies including client/server, data warehousing, and online analytical processing, all of which comprise the core of today's state-of-the-art information management systems.

Examine the components of the term relational database management system. First, a database is an integrated collection of related data. Given a specific data item, the structure of a database facilitates the access to data related to it, such as a student and all of his registered courses or an employee and his dependents. Next, a relational database is a type of database based in the relational model; non-relational databases commonly use a hierarchical, network, or object-oriented model as their basis. Finally, a relational database management system is the software that manages a relational database. These systems come in several varieties, ranging from single-user desktop systems to full-featured, global, enterprise-wide systems, such as Oracle7.

This blog discusses the basic elements of a relational database management system, the relational database, and the software systems that manage it. Also included is a discussion of nonprocedural data access. If you are a new user to relational database technology, you'll have to change your thinking somewhat when it comes to referencing data nonprocedurally.

The Relational Database Model

Most of the database management systems used by commercial applications today are based on one of three basic models: the hierarchical model, the network model, or the relational model. The following sections describe the various differences and similarities of the models.

Hierarchical and Network Models


The first commercially available database management systems were of the CODASYL type, and many of them are still in use with mainframe-based, COBOL applications. Both network and hierarchical databases are quite complex in that they rely on the use of permanent internal pointers to relate records to each other. For example, in an accounts payable application, a vendor record might contain a physical pointer in its record structure that points to purchase order records. Each purchase order record in turn contains pointers to purchase order line item records.

The process of inserting, updating, and deleting records using these types of databases requires synchronization of the pointers, a task that must be performed by the application. As you might imagine, this pointer maintenance requires a significant amount of application code (usually written in COBOL) that at times can be quite cumbersome.

Elements of the Relational Model


Relational databases rely on the actual attribute values as opposed to internal pointers to link records. Instead of using an internal pointer from the vendor record to purchase order records, you would link the purchase order record to the vendor record using a common attribute from each record, such as the vendor identification number.

Although the concepts of academic theory underlying the relational model are somewhat complex, you should be familiar with are some basic concepts and terminology. Essentially, there are three basic components of the relational model: relational data structures, constraints that govern the organization of the data structures, and operations that are performed on the data structures.

Relational Data Structures

The relational model supports a single, "logical" structure called a relation, a two-dimensional data structure commonly called a table in the "physical" database. Attributes represent the atomic data elements that are related by the relation. For example, the Customer relation might contain such attributes about a customer as the customer number, customer name, region, credit status, and so on.


Key Values and Referential Integrity


Attributes are grouped with other attributes based on their dependency on a primary key value. A primary key is an attribute or group of attributes that uniquely identifies a row in a table. A table has only one primary key, and as a rule, every table has one. Because primary key values are used as identifiers, they cannot be null. Using the conventional notation for relations, an attribute is underlined to indicate that it is the primary key of the relation. If a primary key consists of several attributes, each attribute is underlined.

You can have additional attributes in a relation with values that you define as unique to the relation. Unlike primary keys, unique keys can contain null values. In practice, unique keys are used to prevent duplication in the table rather than identify rows. Consider a relation that contains the attribute, United States Social Security Number (SSN). In some rows, this attribute may be null in since not every person has a SSN; however for a row that contains a non-null value for the SSN attribute, the value must be unique to the relation.

Linking one relation to another typically involves an attribute that is common to both relations. The common attributes are usually a primary key from one table and a foreign key from the other. Referential integrity rules dictate that foreign key values in one relation reference the primary key values in another relation. Foreign keys might also reference the primary key of the same relation. Figure illustrates two foreign key relationships.



Oracle and Client/Server


Oracle Corporation's reputation as a database company is firmly established in its full-featured, high-performance RDBMS server. With the database as the cornerstone of its product line, Oracle has evolved into more than just a database company, complementing its RDBMS server with a rich offering of well-integrated products that are designed specifically for distributed processing and client/server applications. As Oracle's database server has evolved to support large-scale enterprise systems for transaction processing and decision support, so too have its other products, to the extent that Oracle can provide a complete solution for client/server application development and deployment. This chapter presents an overview of client/server database systems and the Oracle product architectures that support their implementation.

An Overview of Client/Server Computing

The premise of client/server computing is to distribute the execution of a task among multiple processors in a network. Each processor is dedicated to a specific, focused set of subtasks that it performs best, and the end result is increased overall efficiency and effectiveness of the system as a whole. Splitting the execution of tasks between processors is done through a protocol of service requests; one processor, the client, requests a service from another processor, the server. The most prevalent implementation of client/server processing involves separating the user interface portion of an application from the data access portion.

On the client, or front end, of the typical client/server configuration is a user workstation operating with a Graphical User Interface (GUI) platform, usually Microsoft Windows, Macintosh, or Motif. At the back end of the configuration is a database server, often managed by a UNIX, Netware, Windows NT, or VMS operating system.

Client/server architecture also takes the form of a server-to-server configuration. In this arrangement, one server plays the role of a client, requesting database services from another server. Multiple database servers can look like a single logical database, providing transparent access to data that is spread around the network.

Designing an efficient client/server application is somewhat of a balancing act, the goal of which is to evenly distribute execution of tasks among processors while making optimal use of available resources. Given the increased complexity and processing power required to manage a graphical user interface (GUI) and the increased demands for throughput on database servers and networks, achieving the proper distribution of tasks is challenging. Client/server systems are inherently more difficult to develop and manage than traditional host-based application systems because of the following challenges:

The components of a client/server system are distributed across more varied types of processors. There are many more software components that manage client, network, and server functions, as well as an array of infrastructure layers, all of which must be in place and configured to be compatible with each other.

The complexity of GUI applications far outweighs that of their character-based predecessors. GUIs are capable of presenting much more information to the user and providing many additional navigation paths to elements of the interface.

Troubleshooting performance problems and errors is more difficult because of the increased number of components and layers in the system.

Databases in a Client/Server Architecture

Client/server technologies have changed the look and architecture of application systems in two ways. Not only has the supporting hardware architecture undergone substantial changes, but there have also been significant changes in the approach to designing the application logic of the system.

Prior to the advent of client/server technology, most Oracle applications ran on a single node. Typically, a character-based SQL*Forms application would access a database instance on the same machine with the application and the RDBMS competing for the same CPU and memory resources. Not only was the system responsible for supporting all the database processing, but it was also responsible for executing the application logic. In addition, the system was burdened with all the I/O processing for each terminal on the system; each keystroke and display attribute was controlled by the same processor that processed database requests and application logic.

Client/server systems change this architecture considerably by splitting all of the interface management and much of the application processing from the host system processor and distributing it to the client processor.

Combined with the advances in hardware infrastructure, the increased capabilities of RDBMS servers have also contributed to changes in the application architecture. Prior to the release of Oracle7, Oracle's RDBMS was less sophisticated in its capability to support the processing logic necessary to maintain the integrity of data in the database. For example, primary and foreign key checking and enforcement was performed by the application. As a result, the database was highly reliant on application code for enforcement of business rules and integrity, making application code bulkier and more complex. Figure 2.1 illustrates the differences between traditional host-based applications and client/server applications. Client/server database applications can take advantage of the Oracle7 server features for implementation of some of the application logic.