Tuesday, November 18, 2008

The CREATE Statement in SQL

The CREATE statement opens the world to the user. Whether a simple temporary table is to be created or a complex database schema, you will repeatedly use the CREATE statement. Only a few of the more common CREATE statements are covered here.
Tables

Every database designer will have to create a table sometime. The CREATE TABLE system privilege is needed to execute this command. The DBA is responsible for administering these privileges. The syntax to create a table is :

CREATE TABLE schema.TABLE (COLUMN DATATYPE default expression column constraint) table constraint


PCTFREE x PCTUSED x INITRANS x MAXTRANS x TABLESPACE name STORAGE clause CLUSTER cluster clause

ENABLE clause DISABLE clause AS subquery

In this syntax, SCHEMA is an optional parameter to identify which database schema to place this table in. The default is your own.

TABLE is mandatory and is the name of your table.

COLUMN DATATYPE are required to identify each column in the table. Separate the columns with commas. There is a maximum of 254 columns per table.

The DEFAULT expression is optional and is used to assign a default value to a column when a subsequent insert statement fails to assign a value.

COLUMN CONSTRAINT is optional. It is used to define an integrity constraint such as not null.

TABLE CONSTRAINT is optional and is used to define an integrity constraint as part of the table, such as the primary key.

PCTFREE is optional but has a default of 10. This indicates that 10 percent for each data block will be reserved for future updates to the table's rows. Integers from 1 to 99 are allowed.

PCTUSED is optional but has a default of 40. This indicates the minimum percentage of space used that Oracle maintains before a data block becomes a candidate for row insertion. Integers from 1 to 99 are allowed. The sum of PCTFREE and PCTUSED must be less than 100.

INITRANS is optional but has a default of 1. Integers from 1 to 255 are allowed. It is recommended that you leave this alone. This is an allocation of the number of transaction entries assigned within the data block for the table.

MAXTRANS is optional but has a default that is a function of the data block size. This is used to identify the maximum number of concurrent transactions that can update a data block for your table. It is recommended that this parameter not be changed.

TABLESPACE is optional but has a default value as the tablespace name of the owner of the schema. A different tablespace name than the default can be used. Tablespace names are usually application-dependent. The DBA will be able to give proper recommendations.

STORAGE is optional and has default characteristics defined by the DBA.

CLUSTER is optional and specifies that a table is to be part of a cluster. You must identify the columns from the table that need to be clustered. Typically, the cluster columns are columns that comprise the primary key.

ENABLE is optional and turns on an integrity constraint.

DISABLE is optional and turns off an integrity constraint.

AS SUBQUERY is optional and inserts the rows returned by the subquery into the table upon creation.

Once the table is created, you can use the ALTER TABLE command to make alterations to the table. To modify an integrity constraint, DROP the constraint first, and then re-create it.

Let's look at two examples on creating tables:

CREATE TABLE ADDRESSES (ADRS_ID NUMBER(6),

ACTIVE_DATE DATE,

BOX_NUMBER NUMBER(6),

ADDRS_1 VARCHAR2(40),

ADDRS_2 VARCHAR2(40),

CITY VARCHAR2(40),

STATE VARCHAR2(2),

ZIP VARCHAR2(10));


This is the simplest form of a table create using all of the default capabilities. The second example follows:

CREATE TABLE ADDRESSES (ADRS_ID NUMBER(6) CONSTRAINT PK_ADRS PRIMARY KEY,

ACTIVE_DATE DATE DEFAULT SYSDATE,

BOX_NUMBER NUMBER(6) DEFAULT NULL,

ADDRS_1 VARCHAR2(40) NOT NULL,

ADDRS_2 VARCHAR2(40) DEFAULT NULL,

CITY VARCHAR2(40) DEFAULT NULL,

STATE VARCHAR2(2) DEFAULT 'NY',

ZIP VARCHAR2(10))

PCTFREE 5

PCTUSED 65

TABLESPACE adrs_data

STORAGE (INITIAL 5140

NEXT 5140

MINEXTENTS 1

MAXEXTENTS 10

PCTINCREASE 10);

In this example, data constraints are being utilized and certain storage parameters will be in effect. Using PCTFREE and PCTUSED is a good idea if your data is relatively static.
Indexes

Indexes are used to increase performance of the database. An index is created on one or more columns of a table or cluster. Multiple indexes per table are allowed. The CREATE INDEX system privilege is needed to execute this command. The DBA is responsible for administering these privileges. The syntax to create an index is

CREATE INDEX schema.index ON schema.table (COLUMN ASC/DESC)

CLUSTER schema.cluster INITRANS x MAXTRANS x TABLESPACE name STORAGE clause PCTFREE x NOSORT

In this syntax, SCHEMA is an optional parameter to identify which database schema to place this table in. The default is your own.

INDEX is mandatory and is the name of the index.

ON is a mandatory reserved word.

TABLE is a mandatory table name upon which the index will be built.

COLUMN is the column name to be indexed. If there is more than one column, make sure they are in order of priority.

ASC/DESC are optional parameters. Indexes are built in ascending order by default. Use DESC for descending order.

CLUSTER is needed only if this index is for a cluster.

INITRANS is optional but has a default of 1. Integers from 1 to 255 are allowed. It is recommended that this parameter not be changed. This is an allocation of the number of transaction entries assigned within the data block for the index.

MAXTRANS is optional but has a default that is a function of the data block size. It is used to identify the maximum number of concurrent transactions that can update a data block for the index. It is recommended that this parameter not be changed.

TABLESPACE is optional but has a default value as the tablespace name of the owner of the schema. A different tablespace name than the default might be needed. The DBA will be able to give some recommendations.

STORAGE is optional and has default characteristics defined by the DBA.

PCTFREE is optional but has a default of 10. This indicates that 10 percent for each data block will be reserved for future updates to the index. Integers from 1 to 99 are allowed.

NOSORT is an optional parameter that will save time when creating the index if the table data is already stored in ascending order. This cannot be used if a clustered index is being created.

Using the addresses table defined from the create table example, two indexes will be created in the next example.

CREATE INDEX x_adrs_id ON ADDRESSES (ADRS_ID);

This will create an index on the adrs_id column only.

CREATE INDEX x_city_state ON ADDRESSES (CITY,STATE)

TABLESPACE application_indexes;

This index has two columns; CITY is the primary column. In order for queries to use an index, the column names must be part of the select statement. If a select statement included STATE but not CITY, the index would not be used. However, if the select statement contained a reference to CITY but not STATE, part of the index would be used because CITY is the first column of the index.
Sequences

Sequences are a great way to have the database automatically generate unique integer primary keys. The CREATE SEQUENCE system privilege is needed to execute this command. The DBA is responsible for administering these privileges. The syntax to create a sequence is

CREATE SEQUENCE schema.name

INCREMENT BY x

START WITH x

MAXVALUE x NOMAXVALUE

MINVALUE x NOMINVALUE

CYCLE NOCYCLE

CACHE x NOCACHE

ORDER NOORDER

In this syntax, SCHEMA is an optional parameter that identifies which database schema to place this sequence in. The default is your own.

NAME is mandatory because it is the name of the sequence.

INCREMENT BY is optional. The default is one. Zero is not allowed. If a negative integer is specified, the sequence will descend in order. A positive integer will make the sequence ascend (the default).

START WITH is an optional integer that enables the sequence to begin anywhere.

MAXVALUE is an optional integer that places a limit on the sequence.

NOMAXVALUE is optional. It causes the maximum ascending limit to be 10 27 and -1 for descending sequences. This is the default.

MINVALUE is an optional integer that determines the minimum a sequence can be.

NOMINVALUE is optional. It causes the minimum ascending limit to be 1 and -(10 26) for descending sequences. This is the default.

CYCLE is an option that enables the sequence to continue even when the maximum has been reached. If the maximum is reached, the next sequence that will be generated is whatever the minimum value is.

NOCYCLE is an option that does not enable the sequence to generate values beyond the defined maximum or minimum. This is the default.

CACHE is an option that enables sequence numbers to be preallocated that will be stored in memory for faster access. The minimum value is 2.

NOCACHE is an option that will not enable the preallocation of sequence numbers.

ORDER is an option that ensures the sequence numbers are generated in order of request.

NOORDER is an option that does not ensure that sequence numbers are generated in the order they are requested.

If you want to create a sequence for your adrs_id column in the ADDRESSES table, it could look like the following example:

CREATE SEQUENCE adrs_seq

INCREMENT BY 5

START WITH 100;

To generate a new sequence number, use the pseudocolumn NEXTVAL. This needs to be preceded with your sequence name. For example, adrs_seq.nextval would return 100 for the first access and 105 for the second. If determining the current sequence number is necessary, use CURRVAL. Therefore, adrs_seq.currval will return the current value of the sequence.
Other Objects

The purpose of this chapter is not to elaborate on every SQL statement. The ones given have been covered to give an overview of the more common create statements. Listed next is an alphabetical list of all objects that can be created with the CREATE statement.

CREATE xxx, where xxx is one of the following:

CLUSTER

CONTROLFILE

DATABASE

DATABASE LINK

DATAFILE

FUNCTION

INDEX

PACKAGE BODY

PACKAGE

PROCEDURE

PROFILE

ROLE

ROLLBACK SEGMENT

SCHEMA

SEQUENCE

SNAPSHOT

SNAPSHOT LOG

SYNONYM

TABLE

TABLESPACE

TRIGGER

USER

VIEW
What is RDBMS?

In recent years, database management systems (DBMS) have established themselves as the primary means of data storage for information systems ranging from large commercial transaction processing applications to PC-based desktop applications. At the heart of most of today's information systems is a relational database management system (RDBMS). RDBMSs have been the workhorse for data management operations for over a decade and continue to evolve and mature, providing sophisticated storage, retrieval, and distribution functions to enterprise-wide data processing and information management systems. Compared to the file systems, relational database management systems provide organizations with the capability to easily integrate and leverage the massive amounts of operational data into meaningful information systems. The evolution of high-powered database engines such as Oracle7 has fostered the development of advanced "enabling" technologies including client/server, data warehousing, and online analytical processing, all of which comprise the core of today's state-of-the-art information management systems.

Examine the components of the term relational database management system. First, a database is an integrated collection of related data. Given a specific data item, the structure of a database facilitates the access to data related to it, such as a student and all of his registered courses or an employee and his dependents. Next, a relational database is a type of database based in the relational model; non-relational databases commonly use a hierarchical, network, or object-oriented model as their basis. Finally, a relational database management system is the software that manages a relational database. These systems come in several varieties, ranging from single-user desktop systems to full-featured, global, enterprise-wide systems, such as Oracle7.

This blog discusses the basic elements of a relational database management system, the relational database, and the software systems that manage it. Also included is a discussion of nonprocedural data access. If you are a new user to relational database technology, you'll have to change your thinking somewhat when it comes to referencing data nonprocedurally.

The Relational Database Model

Most of the database management systems used by commercial applications today are based on one of three basic models: the hierarchical model, the network model, or the relational model. The following sections describe the various differences and similarities of the models.

Hierarchical and Network Models


The first commercially available database management systems were of the CODASYL type, and many of them are still in use with mainframe-based, COBOL applications. Both network and hierarchical databases are quite complex in that they rely on the use of permanent internal pointers to relate records to each other. For example, in an accounts payable application, a vendor record might contain a physical pointer in its record structure that points to purchase order records. Each purchase order record in turn contains pointers to purchase order line item records.

The process of inserting, updating, and deleting records using these types of databases requires synchronization of the pointers, a task that must be performed by the application. As you might imagine, this pointer maintenance requires a significant amount of application code (usually written in COBOL) that at times can be quite cumbersome.

Elements of the Relational Model


Relational databases rely on the actual attribute values as opposed to internal pointers to link records. Instead of using an internal pointer from the vendor record to purchase order records, you would link the purchase order record to the vendor record using a common attribute from each record, such as the vendor identification number.

Although the concepts of academic theory underlying the relational model are somewhat complex, you should be familiar with are some basic concepts and terminology. Essentially, there are three basic components of the relational model: relational data structures, constraints that govern the organization of the data structures, and operations that are performed on the data structures.

Relational Data Structures

The relational model supports a single, "logical" structure called a relation, a two-dimensional data structure commonly called a table in the "physical" database. Attributes represent the atomic data elements that are related by the relation. For example, the Customer relation might contain such attributes about a customer as the customer number, customer name, region, credit status, and so on.


Key Values and Referential Integrity


Attributes are grouped with other attributes based on their dependency on a primary key value. A primary key is an attribute or group of attributes that uniquely identifies a row in a table. A table has only one primary key, and as a rule, every table has one. Because primary key values are used as identifiers, they cannot be null. Using the conventional notation for relations, an attribute is underlined to indicate that it is the primary key of the relation. If a primary key consists of several attributes, each attribute is underlined.

You can have additional attributes in a relation with values that you define as unique to the relation. Unlike primary keys, unique keys can contain null values. In practice, unique keys are used to prevent duplication in the table rather than identify rows. Consider a relation that contains the attribute, United States Social Security Number (SSN). In some rows, this attribute may be null in since not every person has a SSN; however for a row that contains a non-null value for the SSN attribute, the value must be unique to the relation.

Linking one relation to another typically involves an attribute that is common to both relations. The common attributes are usually a primary key from one table and a foreign key from the other. Referential integrity rules dictate that foreign key values in one relation reference the primary key values in another relation. Foreign keys might also reference the primary key of the same relation. Figure illustrates two foreign key relationships.



Oracle and Client/Server


Oracle Corporation's reputation as a database company is firmly established in its full-featured, high-performance RDBMS server. With the database as the cornerstone of its product line, Oracle has evolved into more than just a database company, complementing its RDBMS server with a rich offering of well-integrated products that are designed specifically for distributed processing and client/server applications. As Oracle's database server has evolved to support large-scale enterprise systems for transaction processing and decision support, so too have its other products, to the extent that Oracle can provide a complete solution for client/server application development and deployment. This chapter presents an overview of client/server database systems and the Oracle product architectures that support their implementation.

An Overview of Client/Server Computing

The premise of client/server computing is to distribute the execution of a task among multiple processors in a network. Each processor is dedicated to a specific, focused set of subtasks that it performs best, and the end result is increased overall efficiency and effectiveness of the system as a whole. Splitting the execution of tasks between processors is done through a protocol of service requests; one processor, the client, requests a service from another processor, the server. The most prevalent implementation of client/server processing involves separating the user interface portion of an application from the data access portion.

On the client, or front end, of the typical client/server configuration is a user workstation operating with a Graphical User Interface (GUI) platform, usually Microsoft Windows, Macintosh, or Motif. At the back end of the configuration is a database server, often managed by a UNIX, Netware, Windows NT, or VMS operating system.

Client/server architecture also takes the form of a server-to-server configuration. In this arrangement, one server plays the role of a client, requesting database services from another server. Multiple database servers can look like a single logical database, providing transparent access to data that is spread around the network.

Designing an efficient client/server application is somewhat of a balancing act, the goal of which is to evenly distribute execution of tasks among processors while making optimal use of available resources. Given the increased complexity and processing power required to manage a graphical user interface (GUI) and the increased demands for throughput on database servers and networks, achieving the proper distribution of tasks is challenging. Client/server systems are inherently more difficult to develop and manage than traditional host-based application systems because of the following challenges:

The components of a client/server system are distributed across more varied types of processors. There are many more software components that manage client, network, and server functions, as well as an array of infrastructure layers, all of which must be in place and configured to be compatible with each other.

The complexity of GUI applications far outweighs that of their character-based predecessors. GUIs are capable of presenting much more information to the user and providing many additional navigation paths to elements of the interface.

Troubleshooting performance problems and errors is more difficult because of the increased number of components and layers in the system.

Databases in a Client/Server Architecture

Client/server technologies have changed the look and architecture of application systems in two ways. Not only has the supporting hardware architecture undergone substantial changes, but there have also been significant changes in the approach to designing the application logic of the system.

Prior to the advent of client/server technology, most Oracle applications ran on a single node. Typically, a character-based SQL*Forms application would access a database instance on the same machine with the application and the RDBMS competing for the same CPU and memory resources. Not only was the system responsible for supporting all the database processing, but it was also responsible for executing the application logic. In addition, the system was burdened with all the I/O processing for each terminal on the system; each keystroke and display attribute was controlled by the same processor that processed database requests and application logic.

Client/server systems change this architecture considerably by splitting all of the interface management and much of the application processing from the host system processor and distributing it to the client processor.

Combined with the advances in hardware infrastructure, the increased capabilities of RDBMS servers have also contributed to changes in the application architecture. Prior to the release of Oracle7, Oracle's RDBMS was less sophisticated in its capability to support the processing logic necessary to maintain the integrity of data in the database. For example, primary and foreign key checking and enforcement was performed by the application. As a result, the database was highly reliant on application code for enforcement of business rules and integrity, making application code bulkier and more complex. Figure 2.1 illustrates the differences between traditional host-based applications and client/server applications. Client/server database applications can take advantage of the Oracle7 server features for implementation of some of the application logic.