Architucture
SQL Server
Architecture
Components of the SQL Server Engine
Figure
1-1 shows the general architecture of SQL Server, which has four major
components (three of whose subcomponents are listed): protocols, the relational
engine (also called the Query Processor), the storage engine, and the SQLOS.
Every batch submitted to SQL Server for execution, from any client application,
must interact with these four components. (For simplicity, I’ve made some minor
omissions and simplifications and ignored certain “helper” modules among the
subcomponents.)
The
protocol layer receives the request and translates it into a form that the
relational engine can work with, and it also takes the final results of any
queries, status messages, or error messages and translates them into a form the
client can understand before sending them back to the client. The relational
engine layer accepts SQL batches and determines what to do with them. For
Transact-SQL queries and programming constructs, it parses, compiles, and
optimizes the request and oversees the process of executing the batch. As the
batch is executed, if data is needed, a request for that data is passed to the
storage
engine. The storage
engine manages all data access, both through transaction-based commands and
bulk operations such as backup, bulk insert, and certain DBCC (Database
Consistency Checker) commands. The SQLOS layer handles activities that are
normally considered to be operating system responsibilities, such as thread
management (scheduling), synchronization primitives, deadlock detection, and memory
management, including the buffer pool.
SQL
Server can be configured to support multiple protocols simultaneously, coming
from different clients. Each client connects to SQL Server with a single
protocol. If the client program does not know which protocols SQL Server is listening
on, you can configure the client to attempt multiple protocols sequentially.
The following protocols are available:
Shared Memory The simplest protocol to use, with no
configurable settings.
Clients using the Shared Memory protocol can connect
only to a SQL Server instance running on the same computer, so this protocol is
not useful for most database activity. Use this protocol for troubleshooting
when you suspect that the other
protocols are configured incorrectly. Clients
using MDAC 2.8 or earlier cannot use the Shared Memory protocol. If such a
connection is attempted, the client is switched to the Named Pipes protocol.
Named Pipes A protocol developed for local area networks
(LANs). A portion of memory is used by one process to pass information to
another process, so that the output of one is the input of the other. The
second process can be local (on the same computer as the first) or remote (on a
networked computer).
TCP/IP
The most widely
used protocol over the Internet. TCP/IP can communicate across interconnected
networks of computers with diverse hardware architectures
and operating systems. It includes
standards for routing network traffic and offers
advanced security features. Enabling SQL Server
to use TCP/IP requires the most configuration effort, but most networked
computers are already properly configured.
Virtual Interface Adapter (VIA) A protocol that works with VIA hardware. This is
a specialized protocol; configuration details are available from your hardware
vendor.
Tabular Data Stream Endpoints
SQL
Server 2005 also introduces a new concept for defining SQL Server connections:
the connection is represented on the server end by a TDS endpoint. During setup,
SQL Server creates an endpoint for each of the four Net-Library protocols
supported by SQL Server, and if the protocol is enabled, all users have access
to it. For disabled protocols, the endpoint still exists but cannot be used. An additional endpoint is created for
the dedicated administrator
connection (DAC), which can be used only by members of the sysadmin fixed server
role. (I’ll discuss the DAC in more detail in configuration chapter.)
The Relational Engine
As mentioned earlier, the relational engine is
also called the query processor. It includes the components of SQL Server that
determine exactly what your query needs to do and the best way to do it. By far
the most complex component of the query processor, and maybe even
of the entire SQL Server product, is the
query optimizer, which determines the best execution plan for the queries in
the batch.
The
relational engine also manages the execution of queries as it requests data
from the storage engine and processes the results returned. Communication
between the relational engine and the storage engine is generally in terms of
OLE DB row sets. (Row set is the OLE
DB term for a result set.) The
storage engine comprises the components needed to actually access and modify
data on disk.
The Command Parser
The
command parser handles Transact-SQL language events sent to SQL Server. It
checks for proper syntax and translates Transact-SQL commands into an internal
format that can be operated on. This internal format is known as a query tree. If the parser doesn’t
recognize the syntax, a syntax error is immediately raised that identifies
where the error occurred. However, non-syntax error messages cannot be explicit
about the exact source line that caused the error. Because only the command
parser can access the source of the statement, the statement is no longer
available in source format when the command is actually executed.
The Query Optimizer
The
query optimizer takes the query tree from the command parser and prepares it
for execution. Statements that can’t be optimized, such as flow-of-control and
DDL commands, are compiled into an internal form. The statements that are
optimizable are marked as such and then passed to the optimizer. The optimizer
is mainly concerned with the DML statement SELECT,
INSERT, UPDATE, and DELETE, which can be processed in more than one way,
and it is the optimizer’s job to determine which of the many possible ways is
the best. It compiles an entire command batch, optimizes queries that are
optimizable, and checks security. The query optimization and compilation result
in an execution plan.
The
first step in producing such a plan is to normalize
each query, which potentially breaks down a single query into multiple,
fine-grained queries. After the optimizer normalizes a query, it optimizes it, which means it determines
a plan for executing that query. Query optimization is cost based; the optimizer
chooses the plan that it determines would cost the least based on internal
metrics that include estimated memory requirements, CPU utilization, and number
of required I/Os. The optimizer considers the type of statement requested,
checks the amount of data in the various tables affected, looks at the indexes
available for each table, and then looks at a sampling of the data values kept
for each index or column referenced in the query. The sampling of the data
values is called distribution statistics.
Based on the available information, the optimizer considers the various access
methods
and processing strategies it could use to resolve a query and chooses the most
cost-effective plan.
The SQL Manager
The
SQL manager is responsible for everything related to managing stored procedures
and their plans. It determines when a stored procedure needs recompilation, and
it manages the caching of procedure plans so that other processes can reuse
them.
The
SQL manager also handles auto parameterization of queries. In SQL Server 2008,
certain kinds of ad hoc queries are treated as if they were parameterized
stored procedures, and query plans are generated and saved for them. SQL Server
can save and reuse plans in several other ways, but in some situations using a
saved plan might not be a good idea.
The Database Manager
The
database manager handles access to the metadata needed for query compilation
and optimization, making it clear that none of these separate modules can be
run completely separately from the others. The metadata is stored as data and
is managed by the storage engine, but metadata elements such as the data types
of columns and the available indexes on a table must be available during the
query compilation and optimization phase, before actual query execution starts.
The Query Executor
The
query executor runs the execution plan that the optimizer produced, acting as a
dispatcher for all the commands in the execution plan. This module steps
through each command of the execution plan until the batch is complete. Most of
the commands require interaction with the storage engine to modify or retrieve
data and to manage transactions and locking.
The Storage Engine
The
SQL Server storage engine has traditionally been considered to include all the
components involved with the actual processing of data in your database. SQL
Server 2005 separates out some of these components into a module called the
SQLOS. In fact, the SQL Server storage engine team at Microsoft actually
encompasses three areas: access methods, transaction management, and the SQLOS.
Transaction Services
A
core feature of SQL Server is its ability to ensure that transactions are atomic–that is, all or nothing. In
addition, transactions must be durable, which means that if a transaction has
been committed, it must be recoverable by SQL Server no matter what–even if a
total system failure occurs 1 millisecond after the commit was acknowledged.
There are actually four properties that transactions must adhere to, called the
ACID properties: atomicity, consistency, isolation, and durability.
Locking Operations Locking is a crucial function of a multi-user
database system such as
SQL
Server, even if you are operating primarily in the snapshot isolation level with
optimistic
concurrency. SQL Server lets you manage multiple users simultaneously and
ensures that the transactions observe the properties of the chosen isolation
level. Even though readers will not block writers and writers will not block
readers in snapshot isolation, writers do acquire locks and can still block
other writers, and if two writers try to change the same data concurrently, a
conflict will occur that must be resolved. The locking code acquires and
releases various types of locks, such as share locks for reading, exclusive
locks for writing, intent locks taken at a higher granularity to signal a
potential “plan” to perform some operation, and extent locks for space
allocation. It manages compatibility between the lock types, resolves
deadlocks, and escalates locks if needed. The locking code controls table,
page, and row locks as well as system data locks.
The SQLOS
Whether
the components of the SQLOS layer are actually part of the storage engine
depends on whom you ask. In addition,
trying to figure out exactly which components are in the SQLOS layer can be
rather like herding cats. I have seen several technical presentations
on the topic at conferences and have exchanged
e-mail and even spoken face to face with members of the product team, but the
answers vary. The manager who said he was
responsible for the SQLOS layer defined
the SQLOS as everything he was responsible for, which is a rather circular
definition. Earlier versions of SQL Server have a thin layer of
interfaces between the storage engine and
the actual operating system through which SQL
Server makes calls to the OS for memory
allocation, scheduler resources, thread and worker management, and synchronization
objects. However, the services in SQL Server that
needed to access these interfaces can be
in any part of the engine. SQL Server requirements for managing memory,
schedulers, synchronization objects, and so forth have
become more complex. Rather than each
part of the engine growing to support the
increased
functionality, all services in SQL Server that need this OS access have been
grouped together into a single functional unit called the SQLOS. In general,
the SQLOS is like an operating system inside SQL Server. It provides memory
management, scheduling, IO management, a framework for locking and transaction
management, deadlock detection, and general utilities for dumping, exception
handling, and so on.
Another
member of the product team described the SQLOS to me as a set of data
structures and APIs that could potentially be needed by operations running at
any layer of the engine. For example, consider various operations that require
use of memory. SQL Server doesn’t just need memory when it reads in data pages
through the storage engine; it also needs memory to hold query plans developed
in the query processor layer. Figure 1-1 (shown earlier) depicts the SQLOS
layer in several parts, but this is just a way of showing that
many SQL Server components use SQLOS
functionality
Comments
Post a Comment