4.1 An Introduction of Database
A database-management system (DBMS) consists of a collection of interrelated data and a set of programs to access those data. The collection of data, usually referred to as the database, contains information about one particular enterprise. The primary goal of a DBMS is to provide an environment that is both convenient and efficient to use in retrieving and storing database information.
Database systems are designed to manage large bodies of information. The management of data involves both the definition of structures for the storage of information and the provision of mechanisms for the manipulation of information. In addition, the database system must provide for the safety of the information stored, despite system crashes or attempts at unauthorized access. If data are to be shared among several users, the system must avoid possible anomalous results. The importance of information in most organizations,which determines the value of the database—has led to the development of a large body of concepts and techniques for the efficient management of data.
The storage structure and access methods used by the database system are specified by a set of definitions in a special of DDL called a data storage and definition language? The result of compilation of these definitions is a set of instructions to specify the implementation details of the database schemas--details are usually hidden from the users. A database schema is also specified by DDL. Tile result of compilation of DDL statements is a set of tables that is stored in a special file called data dictionary, or data directory. A data dictionary is a file that contains metadata--that is, data about data. This file is consulted before actual data are read or modified in the database system.
Transaction Management
A transaction is a collection of operations that performs a single logical function in a database application. Each transaction is a unit of both atomicity and consistency. Thus, we require that transactions do not violate any database-consistency constraints. That is, if the database was consistent when a transaction started, the database must be consistent when the transaction successfully terminates. However, during the execution of a transaction, it may be necessary temporarily to allow inconsistency. This temporary inconsistency, although necessary, may lead to difficulty if a failure occurs.
It is the responsibility of the programmer to define properly the various transactions, such that each preserves the consistency of the database. For example, the transactions to transfer funds from account A to account B could be defined to be composed of two separate programs: one that debits account A, and another that credits account B. The execution of these two programs one after the other will indeed preserve consistency. However, each program by itself does not transform the database from a consistent state to a new consistent state. Thus, those programs are not transactions.
Ensuring the atomicity and durability properties is the responsibility of the database system itself——specifically, of the transaction management component. In the absence of failures, all transactions complete successfully, and atomicity is achieved easily. However, due to various types of failure, a transaction may not always complete its execution successfully. If we are to ensure the atomicity property, a failed transaction must have no effect on the state of the database. Thus, the database must be restored to the state in which it was before the transaction in question started executing. It is the responsibility of the database system to detect system failures and to restore the database to a state that existed prior to the occurrence of the failure.
Storage Management
Database typically requires a large amount of storage space. Corporate databases are usually measured in terms of gigabytes or, for the largest databases, terabytes of data. A gigabyte is 1000 megabytes or 1 billion bytes, and a terabyte is 1 million megabytes (1 trillion bytes). Since the main memory of computers cannot store this much information, the information is stored on disks. Data are moved between disk storage and main memory as needed. Since the movement of data to and from disk is slow relative to the speed of the central processing unit, it is imperative that the database system structures the data so as to minimize the need to move data between disk and main memory.
The goal of a database system is to simplify and facilitate access to data.;High-level views help to achieve this goal. Users of tile system should not be burdened unnecessarily with the physical details of the implementation of the system. Nevertheless, a major factor in a user's satisfaction or lack thereof with a database system is that system's performance. If the response time for a request is too long, the value of the system is diminished. The performance of a system depends on what the efficiency is of the data structures used to represent tile data in the database, and on how efficiently the system is able to operate on these data structures. As is the case elsewhere in computer systems, a tradeoff must be made not only between space and time, but also between the efficiency of one kind of operation and that of another.
A storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. The storage manager is responsible for the interaction with the file manager. The raw data are stored on the disk using the file system, which is usually provided by a conventional operating system. The storage manager translates the various DML statements into low-level file system commands. Thus, the storage manager is responsible for storing, retrieving, and updating of data in the database.