Database Corner: SQL Server Architecture

SQL Server Architecture can be divided into two parts:
Storage Engine & Relational Engine

Relational engine three different components.

1. Command parser
2. Query Executor
3. Query Optimizer

Before discussing about Relational Engine we need to discuss Protocol layer and TDS communication end points.

TDS (Tabular Data Stream) Endpoints:

TDS is a Microsoft-proprietary protocol originally designed by Sybase that is used to interact with a database server. Once a connection has been made using a network protocol such as TCP/IP, a link is established to the relevant TDS endpoint that then acts as the communication point between the client and the server.

There is one TDS endpoint for each network protocol and an additional one reserved for use by the dedicated administrator connection (DAC). Once connectivity is established, TDS messages are used to communicate between the client and the server. The SELECT statement is sent to the SQL Server as a TDS message across a TCP/IP connection (TCP/IP is the default protocol).

Protocol Layer:

When the protocol layer in SQL Server receives your TDS packet, it has to reverse the work of the SNI at the client and unwrap the packet to find out what request it contains. The protocol layer

is also responsible for packaging up results and status messages to send back to the client as TDS messages. Our SELECT statement is marked in the TDS packet as a message of type “SQL Command,” so it’s passed on to the next component, the Query Parser, to begin the path toward execution.

At the client, the statement was wrapped in a TDS packet by the SQL Server Network Interface and sent to the protocol layer on the SQL Server where it was unwrapped, identified as an SQL Command, and the code sent to the Command Parser by the SNI

The Relational Engine:

The Relational Engine is also sometimes called the query processor because its primary function is query optimization and execution. It contains a Command Parser to check query syntax and prepare query trees, a Query Optimizer that is arguably the crown jewel of any database system, and a Query Executor responsible for execution.

Command Parser:

The Command Parser’s role is to handle T-SQL language events. It first checks the syntax and returns any errors back to the protocol layer to send to the client. If the syntax is valid, then the next step is to generate a query plan or find an existing plan. A Query plan contains the details about how SQL Server is going to execute a piece of code. It is commonly referred to as an execution plan.

Plan Cache:

Creating execution plans can be time-consuming and resource intensive, so The Plan Cache, part of SQL Server’s buffer pool, is used to store execution plans in case they are needed later. LRU algorithm is used for all the plan caches in the Buffer Pool, all oldest plans will be flushed out if they cross LRU timeframe.

Query Optimizer:

The Query Optimizer is one of the most complex and secretive parts of the product. It is what’s known as a “cost-based” optimizer, which means that it evaluates multiple ways to execute a query and then picks the method that it deems will have the lowest cost to execute. This “method” of executing is implemented as a query plan and is the output from the optimizer.

Query Executor:

The Query Executor’s job is self-explanatory; it executes the query. To be more specific, it executes the query plan by working through each step it contains and interacting with the Storage Engine to retrieve or modify data.

The Storage Engine:

The Storage engine is responsible for managing all I/O to the data and contains the Access Methods code, which handles I/O requests for rows, indexes, pages, allocations and row versions, and a Buffer Manager, which deals with SQL Server’s main memory consumer, the buffer pool. It also contains a Transaction Manager, which handles the locking of data to maintain Isolation (ACID properties) and manages the transaction log.

Access Methods:

Access Methods is a collection of code that provides the storage structures for data and indexes as well as the interface through which data is retrieved and modified. It contains all the code to retrieve data but it doesn’t actually perform the operation itself; it passes the request to the Buffer Manager.

Buffer Manager:

The Buffer Manager manages the buffer pool, which represents the majority of SQL Server’s memory usage. If you need to read some rows from a page the Buffer Manager will check the data cache in the buffer pool to see if it already has the page cached in memory. If the page is already cached, then the results are passed back to the Access Methods. If the page isn’t already in cache, then the Buffer Manager will get the page from the database on disk, put it in the data cache, and pass the results to the Access Methods.

Data Cache:

The data cache is usually the largest part of the buffer pool; therefore, it’s the largest memory consumer within SQL Server. It is here that every data page that is read from disk is written to before being used.

Transaction Manager:

The Transaction Manager has two components that are of interest here: a Lock Manager and a Log Manager.

The Lock Manager is responsible for providing concurrency to the data, and it delivers the configured level of isolation by using locks. The Access Methods code requests that the changes it wants to make are logged, and the Log Manager writes the changes to the transaction log. This is called Write-Ahead Logging.

Buffer Pool:

The buffer pool is a place where data is written temporarily. Buffer pool has transaction Manager which intern has two components that are of interest here: a Lock Manager and a Log Manager.

Lock Manager:

The Lock Manager is responsible for providing concurrency to the data, and it delivers The configured level of isolation (as defined in the ACID properties)by using locks.

Log Manager:

Log manager is responsible for executing transitions and behavior of log files. We can discuss more in detail when we go through query life cycle/ execution cycle.

Apart from the components of architecture we need to two more processes

Checkpoint Process:

A checkpoint is a point in time created by the checkpoint process at which SQL Server can be sure that any committed transactions have had all their changes written to disk. This checkpoint then becomes the marker from which database recovery can start.

The checkpoint process ensures that any dirty pages associated with a committed transaction will be flushed to disk. Unlike the lazy writer, however, a checkpoint does not remove the page from cache; it makes sure the dirty page is written to disk and then marks the cached paged as clean in the page header.

By default, on a busy server, SQL Server will issue a checkpoint roughly every minute, which is marked in the transaction log. If the SQL Server instance or the database is restarted, then the

recovery process reading the log knows that it doesn’t need to do anything with log records prior to the checkpoint.

The time between checkpoints, therefore, represents the amount of work that needs to be done to roll forward any committed transactions that occurred after the last checkpoint, and to roll back any Transactions that hadn’t committed. By check-pointing every minute, SQL Server is trying to keep then recovery time when starting a database to less than one minute, but it won’t automatically checkpoint unless at least 10 MB has been written to the log within the period.

Checkpoints can also be manually called by using the CHECKPOINT T-SQL command, and can occur because of other events happening in SQL Server. For example, when you issue a backup command, a checkpoint will run first.

Trace flag 3502 is an undocumented trace flag that records in the error log when a checkpoint starts and stops. For example, after adding it as a startup trace flag and running a workload with numerous writes, my error log contained. which indicates checkpoint is running between 30 and 40 seconds apart.

Lazy Writer:

The lazy writer is a thread that periodically checks the size of the free buffer list. When it’s low, it scans the whole data cache to age-out any pages that haven’t been used for a while. If it finds any

dirty pages that haven’t been used for a while, they are flushed to disk before being marked as free in memory. The lazy writer also monitors the free physical memory on the server and will release memory from the free buffer list back to Windows in very low memory conditions. When SQL Server is busy, it will also grow the size of the free buffer list to meet demand (and therefore the buffer pool) when there is free physical memory and the configured Max Server Memory threshold hasn’t been reached.

A Basic select Statement Life Cycle Summary:

1. The SQL Server Network Interface (SNI) on the client established a connection to the SNI on the SQL Server using a network protocol such as TCP/IP. It then created a connection to a
TDS endpoint over the TCP/IP connection and sent the SELECT statement to SQL Server as a TDS message.
2. The SNI on the SQL Server unpacked the TDS message, read the SELECT statement, and passed a “SQL Command” to the Command Parser.
3. The Command Parser checked the plan cache in the buffer pool for an existing, usable query plan. When it didn’t find one, it created a query tree based on the SELECT statement and passed it to the Optimizer to generate a query plan.
4. The Optimizer generated a “zero cost” or “trivial” plan in the pre-optimization phase because the statement was so simple. The query plan created was then passed to the Query Executor for execution.
5. At execution time, the Query Executor determined that data needed to be read to complete the query plan so it passed the request to the Access Methods in the Storage Engine via an OLE DB interface.
6. The Access Methods needed to read a page from the database to complete the request from the Query Executor and asked the Buffer Manager to provision the data page.
7. The Buffer Manager checked the data cache to see if it already had the page in cache. It wasn’t in cache so it pulled the page from disk, put it in cache, and passed it back to the Access Methods.
8. Finally, the Access Methods passed the result set back to the Relational Engine to send to the client.
A simple Update Query:

The process is exactly the same as the process for the SELECT statement you just looked at until you get to the Access Methods.

The Access Methods need to make a data modification this time, so before it passes on the I/O
request the details of the change need to be persisted to disk. That is the job of the Transaction
Manager. The details about the transaction are stored in the LDF file with the help of Log Manager, also called as WAL.

Lock manager forms an Exclusive lock based on the object scope, Then Page is pulled into the memory and then it is modified, such a modified page is called Dirty Page.

After a checkpoint in buffer pool the dirty pages are flushed to data file/disk . Once the data is written to disk the lock manger will release lock on the object.

Database Corner

Sunday, July 15, 2018

SQL Server Architecture

No comments:

Post a Comment

SQL Server Architecture

Report Abuse