Saturday, November 15, 2008

Overview of the Oracle Architecture

This post describes the components of the Oracle database software that are present (in one form or another) on all machines on which the Oracle database can run. I describe the various components (including memory, process, hardware, and network components) and discuss the interaction between them. In addition, I discuss some of the internal objects (such as rollback segments) so you can have a greater understanding of the inner functioning and setup of the database.

Computer Architecture Fundamentals

In this section, I give a very brief overview of the architecture present in all computers; keep in the mind, however, that a stand-alone machine might not have any network connections.

Memory

Memory is a storage device in a computer chip into which instructions and data are entered and retrieved when needed for processing. It is thousands, if not hundreds of thousands, times faster to read information from memory rather than directly from disk; however, there must be an initial load of information into memory from disk. The larger the memory available for a machine, the quicker it will run.

As far as Oracle is concerned, the instructions are the Oracle programs, and the data is the data read from the Oracle database files.

Processes/Programs

A process or program is a set of instructions to a computer that are run on the machine's processor. These instructions, or programs, are stored on the file system of the machine. Before you can run a program, the instructions for it are read into the machine's memory from the file system.

For Oracle, the processes are the Oracle system (background) processes that look after the database or the user processes that perform work for the user accessing the database.

File Systems

Data and programs for the computer are stored on the machine's file system, which typically consists of one or more hard disks. Programs and data, when required, are loaded from the hard disk into the machine's memory before the program or data is used.

Oracle uses the file system for the sets of the files that make up the database and the software programs that enable you to access the database.

Network

A network is a system of connections between machines that enable one machine to communicate with another and share resources. As well as the physical wires and components, you need to establish a set of rules for communication; this is known as a protocol.

Oracle can support many different types of networks and protocols. If you require communication between machines running Oracle software, you must install the Oracle SQL*Net software on all machines requiring network access.

Global View of the Oracle Architecture

The Oracle architecture described in this section is the generic architecture that applies to all platforms on which Oracle runs. There might be differences in the architecture between different platforms, but the fundamentals are the same.

What Is a Database?

A database is a collection of related data that is used and retrieved together for one or more application systems. The physical location and implementation of the database is transparent to the application programs, and in fact, you could move and restructure the physical database without affecting the programs.






Physically, in its simplest form, an Oracle database is nothing more than a set of files somewhere on disk. The physical location of these files is irrelevant to the function (although important for the performance) of the database. The files are binary files that you can only access using the Oracle kernel software. Querying data in the database files is typically done with one of the Oracle tools (such as SQL*Plus) using the Structured Query Language (SQL).

Logically, the database is divided into a set of Oracle user accounts (schemas), each of which is identified by a username and password unique to that database. Tables and other objects are owned by one of these Oracle users, and access to the data is only available by logging in to the database using an Oracle username and password. Without a valid username and password for the database, you are denied access to anything on the database. The Oracle username and password is different from the operating system username and password. For example, a database residing on a UNIX machine requires that I log in to the UNIX machine using my UNIX operating system username and password and then log in again to Oracle before I can use the database objects (the UNIX login would not be required for a client/server setup). This process of logging in, or connecting to, the database is required whether you're using an Oracle or non-Oracle tool.

The same table name can coexist in two separate Oracle user accounts; although the tables might have the same name, they are different tables. Sometimes, the same database (same set of physical database files) is used for holding different versions of tables (in separate Oracle accounts) for the developers, system testing, or user testing, or the same table name is used in different application systems.

Often, people refer to an Oracle user account as a database, but this is not strictly correct. You could use two Oracle user accounts to hold data for two entirely different application systems; you would have two logical databases implemented in the same physical database using two Oracle user accounts.

In addition to physical files, Oracle processes and memory structures must also be present before you can use the database.


Oracle Files

There are three major sets of files on disk that compose a database.

Database files

Control files

Redo logs

The most important of these are the database files where the actual data resides. The control files and the redo logs support the functioning of the architecture itself.

All three sets of files must be present, open, and available to Oracle for any data on the database to be useable. Without these files, you cannot access the database, and the database administrator might have to recover some or all of the database using a backup, if there is one! All the files are binary.

System and User Processes

For the database files to be useable, you must have the Oracle system processes and one or more user processes running on the machine. The Oracle system processes, also known as Oracle background processes, provide functions for the user processes—functions that would otherwise be done by the user processes themselves. There are many background processes that you can initiate, but as a minimum, only the PMON, SMON, DBWR, and LGWR (all described later in the chapter) must be up and running for the database to be useable. Other background processes support optional additions to the way the database runs.

In addition to the Oracle background processes, there is one user process per connection to the database in its simplest setup. The user must make a connection to the database before he can access any of the objects. If one user logs into Oracle using SQL*Plus, another user chooses Oracle Forms, and yet another user employs the Excel spreadsheet, then you have three user processes against the database—one for each connection.

Memory

Oracle uses the memory (either real or virtual) of the system to run the user processes and the system software itself and to cache data objects. There are two major memory areas used by Oracle: memory that is shared and used by all processes running against the database and memory that is local to each individual user process.

System Memory

Oracle database-wide system memory is known as the SGA, the system global area or shared global area. The data and control structures in the SGA are shareable, and all the Oracle background processes and user processes can use them.


What is RDBMS?

In recent years, database management systems (DBMS) have established themselves as the primary means of data storage for information systems ranging from large commercial transaction processing applications to PC-based desktop applications. At the heart of most of today's information systems is a relational database management system (RDBMS). RDBMSs have been the workhorse for data management operations for over a decade and continue to evolve and mature, providing sophisticated storage, retrieval, and distribution functions to enterprise-wide data processing and information management systems. Compared to the file systems, relational database management systems provide organizations with the capability to easily integrate and leverage the massive amounts of operational data into meaningful information systems. The evolution of high-powered database engines such as Oracle7 has fostered the development of advanced "enabling" technologies including client/server, data warehousing, and online analytical processing, all of which comprise the core of today's state-of-the-art information management systems.

Examine the components of the term relational database management system. First, a database is an integrated collection of related data. Given a specific data item, the structure of a database facilitates the access to data related to it, such as a student and all of his registered courses or an employee and his dependents. Next, a relational database is a type of database based in the relational model; non-relational databases commonly use a hierarchical, network, or object-oriented model as their basis. Finally, a relational database management system is the software that manages a relational database. These systems come in several varieties, ranging from single-user desktop systems to full-featured, global, enterprise-wide systems, such as Oracle7.

This blog discusses the basic elements of a relational database management system, the relational database, and the software systems that manage it. Also included is a discussion of nonprocedural data access. If you are a new user to relational database technology, you'll have to change your thinking somewhat when it comes to referencing data nonprocedurally.

The Relational Database Model

Most of the database management systems used by commercial applications today are based on one of three basic models: the hierarchical model, the network model, or the relational model. The following sections describe the various differences and similarities of the models.

Hierarchical and Network Models


The first commercially available database management systems were of the CODASYL type, and many of them are still in use with mainframe-based, COBOL applications. Both network and hierarchical databases are quite complex in that they rely on the use of permanent internal pointers to relate records to each other. For example, in an accounts payable application, a vendor record might contain a physical pointer in its record structure that points to purchase order records. Each purchase order record in turn contains pointers to purchase order line item records.

The process of inserting, updating, and deleting records using these types of databases requires synchronization of the pointers, a task that must be performed by the application. As you might imagine, this pointer maintenance requires a significant amount of application code (usually written in COBOL) that at times can be quite cumbersome.

Elements of the Relational Model


Relational databases rely on the actual attribute values as opposed to internal pointers to link records. Instead of using an internal pointer from the vendor record to purchase order records, you would link the purchase order record to the vendor record using a common attribute from each record, such as the vendor identification number.

Although the concepts of academic theory underlying the relational model are somewhat complex, you should be familiar with are some basic concepts and terminology. Essentially, there are three basic components of the relational model: relational data structures, constraints that govern the organization of the data structures, and operations that are performed on the data structures.

Relational Data Structures

The relational model supports a single, "logical" structure called a relation, a two-dimensional data structure commonly called a table in the "physical" database. Attributes represent the atomic data elements that are related by the relation. For example, the Customer relation might contain such attributes about a customer as the customer number, customer name, region, credit status, and so on.


Key Values and Referential Integrity


Attributes are grouped with other attributes based on their dependency on a primary key value. A primary key is an attribute or group of attributes that uniquely identifies a row in a table. A table has only one primary key, and as a rule, every table has one. Because primary key values are used as identifiers, they cannot be null. Using the conventional notation for relations, an attribute is underlined to indicate that it is the primary key of the relation. If a primary key consists of several attributes, each attribute is underlined.

You can have additional attributes in a relation with values that you define as unique to the relation. Unlike primary keys, unique keys can contain null values. In practice, unique keys are used to prevent duplication in the table rather than identify rows. Consider a relation that contains the attribute, United States Social Security Number (SSN). In some rows, this attribute may be null in since not every person has a SSN; however for a row that contains a non-null value for the SSN attribute, the value must be unique to the relation.

Linking one relation to another typically involves an attribute that is common to both relations. The common attributes are usually a primary key from one table and a foreign key from the other. Referential integrity rules dictate that foreign key values in one relation reference the primary key values in another relation. Foreign keys might also reference the primary key of the same relation. Figure illustrates two foreign key relationships.



Oracle and Client/Server


Oracle Corporation's reputation as a database company is firmly established in its full-featured, high-performance RDBMS server. With the database as the cornerstone of its product line, Oracle has evolved into more than just a database company, complementing its RDBMS server with a rich offering of well-integrated products that are designed specifically for distributed processing and client/server applications. As Oracle's database server has evolved to support large-scale enterprise systems for transaction processing and decision support, so too have its other products, to the extent that Oracle can provide a complete solution for client/server application development and deployment. This chapter presents an overview of client/server database systems and the Oracle product architectures that support their implementation.

An Overview of Client/Server Computing

The premise of client/server computing is to distribute the execution of a task among multiple processors in a network. Each processor is dedicated to a specific, focused set of subtasks that it performs best, and the end result is increased overall efficiency and effectiveness of the system as a whole. Splitting the execution of tasks between processors is done through a protocol of service requests; one processor, the client, requests a service from another processor, the server. The most prevalent implementation of client/server processing involves separating the user interface portion of an application from the data access portion.

On the client, or front end, of the typical client/server configuration is a user workstation operating with a Graphical User Interface (GUI) platform, usually Microsoft Windows, Macintosh, or Motif. At the back end of the configuration is a database server, often managed by a UNIX, Netware, Windows NT, or VMS operating system.

Client/server architecture also takes the form of a server-to-server configuration. In this arrangement, one server plays the role of a client, requesting database services from another server. Multiple database servers can look like a single logical database, providing transparent access to data that is spread around the network.

Designing an efficient client/server application is somewhat of a balancing act, the goal of which is to evenly distribute execution of tasks among processors while making optimal use of available resources. Given the increased complexity and processing power required to manage a graphical user interface (GUI) and the increased demands for throughput on database servers and networks, achieving the proper distribution of tasks is challenging. Client/server systems are inherently more difficult to develop and manage than traditional host-based application systems because of the following challenges:

The components of a client/server system are distributed across more varied types of processors. There are many more software components that manage client, network, and server functions, as well as an array of infrastructure layers, all of which must be in place and configured to be compatible with each other.

The complexity of GUI applications far outweighs that of their character-based predecessors. GUIs are capable of presenting much more information to the user and providing many additional navigation paths to elements of the interface.

Troubleshooting performance problems and errors is more difficult because of the increased number of components and layers in the system.

Databases in a Client/Server Architecture

Client/server technologies have changed the look and architecture of application systems in two ways. Not only has the supporting hardware architecture undergone substantial changes, but there have also been significant changes in the approach to designing the application logic of the system.

Prior to the advent of client/server technology, most Oracle applications ran on a single node. Typically, a character-based SQL*Forms application would access a database instance on the same machine with the application and the RDBMS competing for the same CPU and memory resources. Not only was the system responsible for supporting all the database processing, but it was also responsible for executing the application logic. In addition, the system was burdened with all the I/O processing for each terminal on the system; each keystroke and display attribute was controlled by the same processor that processed database requests and application logic.

Client/server systems change this architecture considerably by splitting all of the interface management and much of the application processing from the host system processor and distributing it to the client processor.

Combined with the advances in hardware infrastructure, the increased capabilities of RDBMS servers have also contributed to changes in the application architecture. Prior to the release of Oracle7, Oracle's RDBMS was less sophisticated in its capability to support the processing logic necessary to maintain the integrity of data in the database. For example, primary and foreign key checking and enforcement was performed by the application. As a result, the database was highly reliant on application code for enforcement of business rules and integrity, making application code bulkier and more complex. Figure 2.1 illustrates the differences between traditional host-based applications and client/server applications. Client/server database applications can take advantage of the Oracle7 server features for implementation of some of the application logic.