Skip to content

Introduction

约 1142 个字 7 张图片 预计阅读时间 5 分钟

Purpose of Database Systems

Characteristics of DBMS(Database Management System):

  • Efficiency and scalability in data access
  • Reduced application development time
  • Data independence (including physical data independence and logical data independence)
  • Data integrity and security
  • Concurrent access and robustness (i.e., recovery)

通常的文件处理系统具有以下的问题:

  • Data redundancy and inconsistency

    数据冗余与不一致性:有各种各样的文件格式,并且数据重复存储在不同的文件中

  • Difficulty in accessing data

    数据访问困难:对于每一个新任务而言,都需要编写一个新的程序

  • Data isolation

    数据隔离(孤立):数据分散在不同的文件中,数据难以共享

  • Integrity problems

    完整性问题:数据的完整性约束被隐藏在程序代码中(例如,检查余额是否为非负数),很难为现有的程序和文件添加新的约束

  • Atomicity problems

    原子性问题:在文件系统中,很难确保所有相关的操作都能成功完成

    Failures may leave database in an inconsistent state with partial updates carried out

  • Concurrent-access anomalies

    并发访问异常:在文件系统中,很难确保多个用户同时访问数据时不会发生冲突

  • Security problems

    安全问题:在文件系统中,很难确保只有授权的用户才能访问数据(i.e., Right person uses right data)

而在数据库系统为以上这些问题都提供了解决方案。

View of Data

Three-level abstraction of databases

  • Physical level:描述数据在磁盘上的存储方式
  • Logical level:描述在数据库中储存的数据,以及数据之间的关系
  • View level:不同类型用户所看到的数据不同

Schemas and Instances

  • Schema(模式)– the structure of the database on different level
  • Instance(实例)– the actual content of the database at a particular point in time

它们的关系类比于编程语言中的 type 和 variable

  • type \(\leftrightarrow\) schema, variable \(\leftrightarrow\) instance

Data Independence

Ability to modify a schema definition at one level without affecting a schema definition at a higher level.

  • Physical Data Independence (物理数据独立性) – the ability to modify the physical schema without changing the logical schema
  • Logical Data Independence (逻辑数据独立性) - the ability to modify the logical schema without changing the user view schema

Data Models

Data models is a collection of tools for describing data structure, data relationships, data semantics, data constraints.

不同层次的数据抽象需要不同的数据模型来描述

  • Entity-Relationship model(实体-联系模型)
  • Relational model(关系模型)
  • Object-based data model(基于对象的数据模型)

    • Object-oriented model(面向对象模型)
    • Object-relational model(对象-关系模型)
  • Semi-structured data model(XML)(半结构化数据模型)

  • Other older models:

    • Network model (网状模型)
    • Hierarchical model(层次模型)

Database Language

数据库语言主要分为三类:

  • Data Definition Language (DDL):Specification notation for defining the database schema.

    定义数据库的结构、访问方法、一致性约束等,例如创建表、删除表、修改表等

  • Data Manipulation Language (DML):Language for accessing and manipulating the data in the database.

    对数据库中的数据进行操作,例如插入数据、删除数据、更新数据等

  • Data Control Language (DCL)

Data Definition Language (DDL,数据定义语言)

DDL statements are compiled, resulting in a set of tables stored in a special file called data dictionary (数据字典).

Data dictionary contains metadata (元数据,i.e. the data about data) about

  • Database schema
  • Integrity constraints

    • Primary Key
    • Referential integrity
  • Authorization

Data Manipulation Language (DML,数据操作语言)

Two classes of languages

  • Procedural (过程式) – user specifies what data is required and how to get those data(e.g. C)
  • Declarative (nonprocedural,陈述式,非过程式) – user specifies what data is required without specifying how to get those data(e.g. SQL)

SQL is the most widely used query language

SQL(Structured Query Language)

SQL = DDL + DML + DCL

SQL is the most widely used non-procedural query language.

Database Design

Entity-Relationship (E-R) Model

Relational Model

A Sample of Relational Model: University Database

Database Users and Administrators

Users are differentiated by the way they expect to interact with the system.

  • Naive users(普通用户)– invoke one of the permanent application programs that have been written previously by a high level language
  • Application programmers(应用程序员)– interact with system via SQL calls
  • Sophisticated users(高级用户)– form requests in a database query language
  • Specialized users(专业用户)– write specialized database applications that do not fit into the traditional data processing framework
  • Database administrator (DBA,数据库管理员): A special user having central control over database and programs accessing those data.

Transaction Management

  • Concurrent use/access is important, but causes problems/conflict.
  • A transaction is a collection of operations that performs a single logical function in a database application.
  • Transaction requirements include atomicity, consistence, isolation, durability.
  • Transaction-management component ensures that the database remains in a consistent (or correct) state, although system failures (e.g., power failures and operating system crashes) and transaction failures.
  • Concurrency-control manager controls the interaction among the concurrent transactions.

Database Architecture

数据库架构主要有三个部分:

  • Storage manager
  • Query processor
  • Database administrator

Storage Manager

存储管理器是负责提供底层数据和应用程序之间的接口的模块,包括

  • Transaction manager
  • Authorization and integrity manger
  • File manager (interaction with the file system to process data files, data dictionary, and index files)
  • Buffer manager

Query Processor

Query Processor includes DDL interpreter, DML compiler, and query processing.

  • Parsing and translation
  • Optimization
  • Evaluation

Comments