Database Corner

Sunday, July 15, 2018

SQL Server Architecture

SQL Server Architecture can be divided into two parts:
Storage Engine & Relational Engine

Relational engine three different components.

1. Command parser
2. Query Executor
3. Query Optimizer

Before discussing about Relational Engine we need to discuss Protocol layer and TDS communication end points.

TDS (Tabular Data Stream) Endpoints:

TDS is a Microsoft-proprietary protocol originally designed by Sybase that is used to interact with a database server. Once a connection has been made using a network protocol such as TCP/IP, a link is established to the relevant TDS endpoint that then acts as the communication point between the client and the server.

There is one TDS endpoint for each network protocol and an additional one reserved for use by the dedicated administrator connection (DAC). Once connectivity is established, TDS messages are used to communicate between the client and the server. The SELECT statement is sent to the SQL Server as a TDS message across a TCP/IP connection (TCP/IP is the default protocol).

Protocol Layer:

When the protocol layer in SQL Server receives your TDS packet, it has to reverse the work of the SNI at the client and unwrap the packet to find out what request it contains. The protocol layer

is also responsible for packaging up results and status messages to send back to the client as TDS messages. Our SELECT statement is marked in the TDS packet as a message of type “SQL Command,” so it’s passed on to the next component, the Query Parser, to begin the path toward execution.

At the client, the statement was wrapped in a TDS packet by the SQL Server Network Interface and sent to the protocol layer on the SQL Server where it was unwrapped, identified as an SQL Command, and the code sent to the Command Parser by the SNI

The Relational Engine:

The Relational Engine is also sometimes called the query processor because its primary function is query optimization and execution. It contains a Command Parser to check query syntax and prepare query trees, a Query Optimizer that is arguably the crown jewel of any database system, and a Query Executor responsible for execution.

Command Parser:

The Command Parser’s role is to handle T-SQL language events. It first checks the syntax and returns any errors back to the protocol layer to send to the client. If the syntax is valid, then the next step is to generate a query plan or find an existing plan. A Query plan contains the details about how SQL Server is going to execute a piece of code. It is commonly referred to as an execution plan.

Plan Cache:

Creating execution plans can be time-consuming and resource intensive, so The Plan Cache, part of SQL Server’s buffer pool, is used to store execution plans in case they are needed later. LRU algorithm is used for all the plan caches in the Buffer Pool, all oldest plans will be flushed out if they cross LRU timeframe.

Query Optimizer:

The Query Optimizer is one of the most complex and secretive parts of the product. It is what’s known as a “cost-based” optimizer, which means that it evaluates multiple ways to execute a query and then picks the method that it deems will have the lowest cost to execute. This “method” of executing is implemented as a query plan and is the output from the optimizer.

Query Executor:

The Query Executor’s job is self-explanatory; it executes the query. To be more specific, it executes the query plan by working through each step it contains and interacting with the Storage Engine to retrieve or modify data.

The Storage Engine:

The Storage engine is responsible for managing all I/O to the data and contains the Access Methods code, which handles I/O requests for rows, indexes, pages, allocations and row versions, and a Buffer Manager, which deals with SQL Server’s main memory consumer, the buffer pool. It also contains a Transaction Manager, which handles the locking of data to maintain Isolation (ACID properties) and manages the transaction log.

Access Methods:

Access Methods is a collection of code that provides the storage structures for data and indexes as well as the interface through which data is retrieved and modified. It contains all the code to retrieve data but it doesn’t actually perform the operation itself; it passes the request to the Buffer Manager.

Buffer Manager:

The Buffer Manager manages the buffer pool, which represents the majority of SQL Server’s memory usage. If you need to read some rows from a page the Buffer Manager will check the data cache in the buffer pool to see if it already has the page cached in memory. If the page is already cached, then the results are passed back to the Access Methods. If the page isn’t already in cache, then the Buffer Manager will get the page from the database on disk, put it in the data cache, and pass the results to the Access Methods.

Data Cache:

The data cache is usually the largest part of the buffer pool; therefore, it’s the largest memory consumer within SQL Server. It is here that every data page that is read from disk is written to before being used.

Transaction Manager:

The Transaction Manager has two components that are of interest here: a Lock Manager and a Log Manager.

The Lock Manager is responsible for providing concurrency to the data, and it delivers the configured level of isolation by using locks. The Access Methods code requests that the changes it wants to make are logged, and the Log Manager writes the changes to the transaction log. This is called Write-Ahead Logging.

Buffer Pool:

The buffer pool is a place where data is written temporarily. Buffer pool has transaction Manager which intern has two components that are of interest here: a Lock Manager and a Log Manager.

Lock Manager:

The Lock Manager is responsible for providing concurrency to the data, and it delivers The configured level of isolation (as defined in the ACID properties)by using locks.

Log Manager:

Log manager is responsible for executing transitions and behavior of log files. We can discuss more in detail when we go through query life cycle/ execution cycle.

Apart from the components of architecture we need to two more processes

Checkpoint Process:

A checkpoint is a point in time created by the checkpoint process at which SQL Server can be sure that any committed transactions have had all their changes written to disk. This checkpoint then becomes the marker from which database recovery can start.

The checkpoint process ensures that any dirty pages associated with a committed transaction will be flushed to disk. Unlike the lazy writer, however, a checkpoint does not remove the page from cache; it makes sure the dirty page is written to disk and then marks the cached paged as clean in the page header.

By default, on a busy server, SQL Server will issue a checkpoint roughly every minute, which is marked in the transaction log. If the SQL Server instance or the database is restarted, then the

recovery process reading the log knows that it doesn’t need to do anything with log records prior to the checkpoint.

The time between checkpoints, therefore, represents the amount of work that needs to be done to roll forward any committed transactions that occurred after the last checkpoint, and to roll back any Transactions that hadn’t committed. By check-pointing every minute, SQL Server is trying to keep then recovery time when starting a database to less than one minute, but it won’t automatically checkpoint unless at least 10 MB has been written to the log within the period.

Checkpoints can also be manually called by using the CHECKPOINT T-SQL command, and can occur because of other events happening in SQL Server. For example, when you issue a backup command, a checkpoint will run first.

Trace flag 3502 is an undocumented trace flag that records in the error log when a checkpoint starts and stops. For example, after adding it as a startup trace flag and running a workload with numerous writes, my error log contained. which indicates checkpoint is running between 30 and 40 seconds apart.

Lazy Writer:

The lazy writer is a thread that periodically checks the size of the free buffer list. When it’s low, it scans the whole data cache to age-out any pages that haven’t been used for a while. If it finds any

dirty pages that haven’t been used for a while, they are flushed to disk before being marked as free in memory. The lazy writer also monitors the free physical memory on the server and will release memory from the free buffer list back to Windows in very low memory conditions. When SQL Server is busy, it will also grow the size of the free buffer list to meet demand (and therefore the buffer pool) when there is free physical memory and the configured Max Server Memory threshold hasn’t been reached.

A Basic select Statement Life Cycle Summary:

1. The SQL Server Network Interface (SNI) on the client established a connection to the SNI on the SQL Server using a network protocol such as TCP/IP. It then created a connection to a
TDS endpoint over the TCP/IP connection and sent the SELECT statement to SQL Server as a TDS message.
2. The SNI on the SQL Server unpacked the TDS message, read the SELECT statement, and passed a “SQL Command” to the Command Parser.
3. The Command Parser checked the plan cache in the buffer pool for an existing, usable query plan. When it didn’t find one, it created a query tree based on the SELECT statement and passed it to the Optimizer to generate a query plan.
4. The Optimizer generated a “zero cost” or “trivial” plan in the pre-optimization phase because the statement was so simple. The query plan created was then passed to the Query Executor for execution.
5. At execution time, the Query Executor determined that data needed to be read to complete the query plan so it passed the request to the Access Methods in the Storage Engine via an OLE DB interface.
6. The Access Methods needed to read a page from the database to complete the request from the Query Executor and asked the Buffer Manager to provision the data page.
7. The Buffer Manager checked the data cache to see if it already had the page in cache. It wasn’t in cache so it pulled the page from disk, put it in cache, and passed it back to the Access Methods.
8. Finally, the Access Methods passed the result set back to the Relational Engine to send to the client.
A simple Update Query:

The process is exactly the same as the process for the SELECT statement you just looked at until you get to the Access Methods.

The Access Methods need to make a data modification this time, so before it passes on the I/O
request the details of the change need to be persisted to disk. That is the job of the Transaction
Manager. The details about the transaction are stored in the LDF file with the help of Log Manager, also called as WAL.

Lock manager forms an Exclusive lock based on the object scope, Then Page is pulled into the memory and then it is modified, such a modified page is called Dirty Page.

After a checkpoint in buffer pool the dirty pages are flushed to data file/disk . Once the data is written to disk the lock manger will release lock on the object.

SQL Server Port Numbers

SQL Server has evolved from a simple relational database engine to a multipurpose enterprise-level data platform. The subsystems and features that Microsoft has added—and continues to add—to SQL Server have significantly increased the network connections that the platform uses. Sometime it's tricky to figure out which firewall ports to open for each SQL Server feature. To help you, here's a rundown of commonly used SQL Server network ports.

TCP 1433

TCP port 1433 is the default port for SQL Server. This port is also the official Internet Assigned Number Authority (IANA) socket number for SQL Server. Client systems use TCP 1433 to connect to the database engine; SQL Server Management Studio (SSMS) uses the port to manage SQL Server instances across the network. You can reconfigure SQL Server to listen on a different port, but 1433 is by far the most common implementation.

TCP 1434

TCP port 1434 is the default port for the Dedicated Admin Connection. You can start the Dedicated Admin Connection through sqlcmd or by typing ADMIN: followed by the server name in the SSMS Connect to Database Engine dialog box.

UDP 1434

UDP port 1434 is used for SQL Server named instances. The SQL Server Browser service listens on this port for incoming connections to a named instance. The service then responds to the client with the TCP port number for the requested named instance.

TCP 2383

TCP port 2383 is the default port for SQL Server Analysis Services.

TCP 2382

TCP port 2382 is used for connection requests to a named instance of Analysis Services. Much like the SQL Server Browser service does for the relational database engine on UDP 1434, the SQL Server Browser listens on TCP 2382 for requests for Analysis Services named instances. Analysis Services then redirects the request to the appropriate port for the named instance.

TCP 135

TCP port 135 has several uses. The Transact-SQL debugger uses the port. TCP 135 is also used to start, stop, and control SQL Server Integration Services, although it is required only if you connect to a remote instance of the service from SSMS.

TCP 80 and 443

TCP ports 80 and 443 are most typically used for report server access. However, they also support URL requests to SQL Server and Analysis Services. TCP 80 is the standard port for HTTP connections that use a URL. TCP 443 is used for HTTPS connections that use secure sockets layer (SSL).

Unofficial TCP Ports

Microsoft uses TCP port 4022 for SQL Server Service Broker examples in SQL Server Books Online. Likewise, Database Mirroring examples use TCP port 7022.

Database Statistics

SQL Server Query Optimizer uses statistics to estimate the distribution of values in one or more columns of a table or index views, and the number of rows to create a high-quality query execution plan. Often statistics are created on a single column but it’s not uncommon to create statistics on multiple columns.

Each statistics object contains a histogram displaying the distribution of values of the column (or of the first column in the case of multi-column statistics). Multi-column statistics also contains a correlation of values among the columns (called densities), which are derived from the number of distinct rows or the column values.

There are different ways you can view the details of the statistics objects. For example, you can use the DBCC SHOW_STATISTICS command. DBCC SHOW_STATISTICS shows the header, histogram, and density vector based on data stored in the statistics object.

Statistics Options

There are three options that you can set that affect when and how statistics are created and updated. These options are set at the database level only.

AUTO_CREATE_STATISTICS Option:

When the automatic create statistics option, AUTO_CREATE_STATISTICS, is on, the query optimizer creates statistics on individual columns in the query predicate, as necessary, to improve cardinality estimates for the query plan. These single-column statistics are created on columns that do not already have a histogram in an existing statistics object. The AUTO_CREATE_STATISTICS option does not determine whether statistics get created for indexes. This option also does not generate filtered statistics. It applies strictly to single-column statistics for the full table.

When the query optimizer creates statistics as a result of using the AUTO_CREATE_STATISTICS option, the statistics name starts with _WA. You can use the following query to determine if the query optimizer has created statistics for a query predicate column.

SELECT OBJECT_NAME(s.object_id) AS object_name,

COL_NAME(sc.object_id, sc.column_id) AS column_name,

s.name AS statistics_name

FROM sys.stats AS s JOIN sys.stats_columns AS sc

ON s.stats_id = sc.stats_id AND s.object_id = sc.object_id

WHERE s.name like '_WA%'

ORDER BY s.name;

AUTO_UPDATE_STATISTICS Option:

When the automatic update statistics option, AUTO_UPDATE_STATISTICS, is on, the query optimizer determines when statistics might be out-of-date and then updates them when they are used by a query. Statistics become out-of-date after insert, update, delete, or merge operations change the data distribution in the table or indexed view. The query optimizer determines when statistics might be out-of-date by counting the number of data modifications since the last statistics update and comparing the number of modifications to a threshold. The threshold is based on the number of rows in the table or indexed view.

SQL Server (2014 and earlier) uses a threshold based on the percent of rows changed. This is regardless of the number of rows in the table.
SQL Server (starting with 2016 and under the compatibility level 130) uses a threshold that adjusts according to the number of rows in the table. With this change, statistics on large tables will be updated more often.

The query optimizer checks for out-of-date statistics before compiling a query and before executing a cached query plan. Before compiling a query, the query optimizer uses the columns, tables, and indexed views in the query predicate to determine which statistics might be out-of-date. Before executing a cached query plan, the Database Engine verifies that the query plan references up-to-date statistics.

The AUTO_UPDATE_STATISTICS option applies to statistics objects created for indexes, single-columns in query predicates, and statistics created with the CREATE STATISTICS statement. This option also applies to filtered statistics.

AUTO_UPDATE_STATISTICS_ASYNC:

The asynchronous statistics update option, AUTO_UPDATE_STATISTICS_ASYNC, determines whether the query optimizer uses synchronous or asynchronous statistics updates. By default, the asynchronous statistics update option is off, and the query optimizer updates statistics synchronously. The AUTO_UPDATE_STATISTICS_ASYNC option applies to statistics objects created for indexes, single columns in query predicates, and statistics created with the CREATE STATISTICS statement.

Statistics updates can be either synchronous (the default) or asynchronous. With synchronous statistics updates, queries always compile and execute with up-to-date statistics; when statistics are out-of-date, the query optimizer waits for updated statistics before compiling and executing the query. With asynchronous statistics updates, queries compile with existing statistics even if the existing statistics are out-of-date; the query optimizer could choose a suboptimal query plan if statistics are out-of-date when the query compiles. Queries that compile after the asynchronous updates have completed will benefit from using the updated statistics.

Instance Aware & Unaware Services

1. Instance Aware Services

Instance-aware services are associated with a specific instance of SQL Server, and have their own registry hives. We can install multiple copies of instance-aware services by running SQL Server Setup for each component or service.

i.e.

1. SQL Server

2. SQL Server Agent

3. Analysis Service

4. Reporting Service

5. Full-text search

2. Instance Unaware Services

Instance-unaware services are shared among all installed SQL Server instances. They are not associated with a specific instance, are installed only once, and cannot be installed side by side.

i.e.

1. Integration Services

2. SQL Server Browser

3. SQL Server Active Directory Helper

4. SQL Writer

New Features in SQL Server 2016

1. Always Encrypted:

Always Encrypted is designed to protect data at rest or in motion. With Always Encrypted, SQL Server can perform operations on encrypted data and the encryption key can reside with the application. Encryption and decryption of data happens transparently inside the application. This means the data stored in SQL Server will be encrypted which can secure it from DBA and administrators but that also has considerations for ad-hoc queries, reporting and exporting the data.

2. Stretch Database:

The idea behind this feature is certainly interesting. The upcoming stretch database feature will allow you to dynamically stretch your on-premise database to Azure. This would enable your frequently accessed or hot data to stay on-premise and your infrequently accessed cold data to be moved to the cloud. This could enable you to take advantage of low cost Azure store and still have high performance applications. However, this is one trick where Microsoft really needs to get the partitioning right to keep your queries from straying into the cloud and killing your performance.

3. Real-time Operational Analytics:

This feature uses the dynamic duo of SQL Server’s in-memory technologies; it combines In-Memory OLTP with the in-memory columnstore for real-time operational analytics. Its purpose is to tune your system for optimal transactional performance as well as increase workload concurrency. This sounds like a great combination and applying analytics to your system’s performance is something a lot of customers have asked for a long time but you will certainly need to have the memory to take advantage of it.

4. PolyBase into SQL Server:

Big Data continues to grow in strategic importance but unless you had the SQL Server Parallel Data Warehouse (PDW) connecting SQL Server to Dig Data and Hadoop in particular was limited and difficult. In previous releases, PDW was the only version of SQL Server that came with PolyBase – a technology that bridged SQL Server and Hadoop by enabling you to construct and run SQL queries over Hadoop data stores eliminating the need to understand HDFS or MapReduce. SQL Server 2016 promises to bring the PolyBase technology mainstream into the primary SQL Server SKUs (probably the Enterprise edition).

5. Native JSON Support:

JSON (JavaScript Object Notation) is a standardized data exchange format that is currently not supported natively by SQL Server. To perform JSON imports and exports you need to hand-code complex T-SQL, SQLCLR or JavaScript. SQL Server 2016 promises to simply this by incorporating JSON support directly into SQL Server much like XML. SQL Server 2016 will natively parse and store JSON as relational data and will support exporting relational data to JSON.

6. Enhancements to AlwaysOn:

SQL Server 2016 will also continue to advance high availability and disaster recovery with several enhancements to AlwaysOn. The upcoming SQL Server 2016 release will enhance AlwaysOn with the ability to have up to three synchronous replicas. Additionally, it will include DTC (Distributed Transaction Coordinator) support as well as support for round-robin load balancing of the secondaries replicas. There will also be support for automatic failover based on database health.

7. Enhanced In-Memory OLTP:

First introduced with SQL Server 2014, In-Memory OLTP will continue to mature in SQL Server 2016. Microsoft will enhance In-Memory OLTP by extending the functionality to more applications while also enhancing concurrency. This means they will be expanding the T-SQL surface area, increasing the total amount of memory supported into the terabyte range as well as supporting a greater number of parallel CPUs.

8. Revamped SQL Server Data Tools:

Another welcome change in SQL Server 2016 is the reconsolidation of SQL Server Data Tools (SSDT). As Microsoft worked to supplant the popular and useful Business Development Studio (BIDS) with SQL Server Data Tools they wound up confusing almost everyone by creating not one but two versions of SQL Server Data Tools both of which needed to be downloaded separately from installing SQL Server itself. With the SQL Server 2016 release Microsoft has indicated that they intend to reconsolidate SQL Server Data Tools.

New Features in SQL Server 2014

Microsoft introduced some significant enhancements in SQL Server 2014—especially with In-Memory OLTP (online transaction processing). However, as you might expect after such a short release cycle, not every subsystem has been updated; there are no major changes to SQL Server Integration Services (SSIS), SQL Server Replication Services, or SQL Server Reporting Services (SSRS). Nonetheless, there are plenty of significant enhancements. Here are 10 new features in SQL Server 2014.

1. In-Memory OLTP Engine:

SQL Server 2014 enables memory optimization of selected tables and stored procedures. The In-Memory OLTP engine is designed for high concurrency and uses a new optimistic concurrency control mechanism to eliminate locking delays. Microsoft states that customers can expect performance to be up to 20 times better than with SQL Server 2012 when using this new feature.

2. AlwaysOn Enhancements:

Microsoft has enhanced AlwaysOn integration by expanding the maximum number of secondary replicas from four to eight. Readable secondary replicas are now also available for read workloads, even when the primary replica is unavailable. In addition, SQL Server 2014 provides the new Add Azure Replica Wizard, which helps you create asynchronous secondary replicas in Windows Azure.

3. Buffer Pool Extension:

SQL Server 2014 provides a new solid state disk (SSD) integration capability that lets you use SSDs to expand the SQL Server 2014 Buffer Pool as nonvolatile RAM (NvRAM). With the new Buffer Pool Extensions feature, you can use SSD drives to expand the buffer pool in systems that have maxed out their memory. Buffer Pool Extensions can provide performance gains for read-heavy OLTP workloads.

4. Updateable Columnstore Indexes:

When Microsoft introduced the columnstore index in SQL Server 2012, it provided improved performance for data warehousing queries. For some queries, the columnstore indexes provided a tenfold performance improvement. However, to utilize the columnstore index, the underlying table had to be read-only. SQL Server 2014 eliminates this restriction with the new updateable Columnstore Index. The SQL Server 2014 Columnstore Index must use all the columns in the table and can’t be combined with other indexes.

5. Storage I/O control:

The Resource Governor lets you limit the amount of CPU and memory that a given workload can consume. SQL Server 2014 extends the reach of the Resource Governor to manage storage I/O usage as well. The SQL Server 2014 Resource Governor can limit the physical I/Os issued for user threads in a given resource pool.

6. Power View for Multidimensional Models:

Power View used to be limited to tabular data. However, with SQL Server 2014, Power View can now be used with multidimensional models (OLAP cubes) and can create a variety of data visualizations including tables, matrices, bubble charts, and geographical maps. Power View multidimensional models also support queries using Data Analysis Expressions (DAX).

7. Power BI for Office 365 Integration:

Power BI for Office 365 is a cloud-based business intelligence (BI) solution that provides data navigation and visualization capabilities. Power BI for Office 365 includes Power Query (formerly code-named Data Explorer), Power Map (formerly code-named GeoFlow), Power Pivot, and Power View.

8. SQL Server Data Tools for Business Intelligence:

The new SQL Server Data Tools for BI (SSDT-BI) is used to create SQL Server Analysis Services (SSAS) models, SSRS reports, and SSIS packages. The new SSDT-BI supports SSAS and SSRS for SQL Server 2014 and earlier, but SSIS projects are limited to SQL Server 2014. In the pre-release version of SQL Server 2014, SQL Server Setup doesn’t install SSDT-BI.

9. Backup Encryption:

One welcome addition to SQL Server 2014 is the ability to encrypt database backups for at-rest data protection. SQL Server 2014 supports several encryption algorithms, including Advanced Encryption Standard (AES) 128, AES 192, AES 256, and Triple DES. You must use a certificate or an asymmetric key to perform encryption for SQL Server 2014 backups.

10. SQL Server Managed Backup to Windows Azure:

SQL Server 2014’s native backup supports Windows Azure integration. Although I’m not entirely convinced that I would want to depend on an Internet connection to restore my backups, on-premises SQL Server 2014 and Windows Azure virtual machine (VM) instances support backing up to Windows Azure storage. The Windows Azure backup integration is also fully built into SQL Server Management Studio (SSMS).

Hardware and Software Pre-Requisite’s for a successful SQL Server Installation

Framework:

.NET Framework 3.5 SP11

SQL Server Native Client

SQL Server Setup support files

Software:

SQL Server Setup requires Microsoft Windows Installer 4.5 or a later version.

After installing required components, SQL Server Setup will verify that the computer where SQL Server 2008 will be installed also meets all the other requirements for a successful installation.

Network Software:

Network software requirements for the 64-bit versions of SQL Server 2008 are the same as the requirements for the 32-bit versions.

Supported operating systems have built-in network software. Stand-alone named and default instances support the following network protocols:

Shared memory
Named Pipes
TCP/IP
VIA

Virtualization:

SQL Server 2008 is supported in virtual machine environments running on the Hyper-V role in Windows Server 2008 R2 and Windows Server 2008 Standard, Enterprise and Datacenter editions. The virtual machine must run an operating system supported for the specific SQL Server 2008 edition listed later in this topic.

Internet Software:

Microsoft Internet Explorer 6 SP1 or a later version is required for all installations of SQL Server 2008. Internet Explorer 6 SP1 or a later version is required for Microsoft Management Console (MMC), SQL Server Management Studio, Business Intelligence Development Studio, the Report Designer component of Reporting Services, and HTML Help.

Hard Disk:

Disk space requirements will vary with the SQL Server 2008 components you install.

Drive:

A CD or DVD drive, as appropriate, is required for installation from disc.

Display:

SQL Server 2008 graphical tools require VGA or higher resolution: at least 1,024x768 pixel resolution.

Other Devices:

Pointing device: A Microsoft mouse or compatible pointing device is required.