Sunday, July 15, 2018

Database Statistics



SQL Server Query Optimizer uses statistics to estimate the distribution of values in one or more columns of a table or index views, and the number of rows to create a high-quality query execution plan. Often statistics are created on a single column but it’s not uncommon to create statistics on multiple columns.
Each statistics object contains a histogram displaying the distribution of values of the column (or of the first column in the case of multi-column statistics). Multi-column statistics also contains a correlation of values among the columns (called densities), which are derived from the number of distinct rows or the column values.
There are different ways you can view the details of the statistics objects. For example, you can use the DBCC SHOW_STATISTICS command. DBCC SHOW_STATISTICS shows the header, histogram, and density vector based on data stored in the statistics object.

                                               Statistics Options

 There are three options that you can set that affect when and how statistics are created and updated. These options are set at the database level only.


AUTO_CREATE_STATISTICS Option:

When the automatic create statistics option, AUTO_CREATE_STATISTICS, is on, the query optimizer creates statistics on individual columns in the query predicate, as necessary, to improve cardinality estimates for the query plan. These single-column statistics are created on columns that do not already have a histogram in an existing statistics object. The AUTO_CREATE_STATISTICS option does not determine whether statistics get created for indexes. This option also does not generate filtered statistics. It applies strictly to single-column statistics for the full table.
When the query optimizer creates statistics as a result of using the AUTO_CREATE_STATISTICS option, the statistics name starts with _WA. You can use the following query to determine if the query optimizer has created statistics for a query predicate column.

SELECT OBJECT_NAME(s.object_id) AS object_name, 
    COL_NAME(sc.object_id, sc.column_id) AS column_name, 
    s.name AS statistics_name 
FROM sys.stats AS s JOIN sys.stats_columns AS sc 
    ON s.stats_id = sc.stats_id AND s.object_id = sc.object_id 
WHERE s.name like '_WA%' 
ORDER BY s.name; 
  

AUTO_UPDATE_STATISTICS Option:

When the automatic update statistics option, AUTO_UPDATE_STATISTICS, is on, the query optimizer determines when statistics might be out-of-date and then updates them when they are used by a query. Statistics become out-of-date after insert, update, delete, or merge operations change the data distribution in the table or indexed view. The query optimizer determines when statistics might be out-of-date by counting the number of data modifications since the last statistics update and comparing the number of modifications to a threshold. The threshold is based on the number of rows in the table or indexed view.
  • SQL Server (2014 and earlier) uses a threshold based on the percent of rows changed. This is regardless of the number of rows in the table.
  • SQL Server (starting with 2016 and under the compatibility level 130) uses a threshold that adjusts according to the number of rows in the table. With this change, statistics on large tables will be updated more often.
The query optimizer checks for out-of-date statistics before compiling a query and before executing a cached query plan. Before compiling a query, the query optimizer uses the columns, tables, and indexed views in the query predicate to determine which statistics might be out-of-date. Before executing a cached query plan, the Database Engine verifies that the query plan references up-to-date statistics.
The AUTO_UPDATE_STATISTICS option applies to statistics objects created for indexes, single-columns in query predicates, and statistics created with the CREATE STATISTICS statement. This option also applies to filtered statistics.


AUTO_UPDATE_STATISTICS_ASYNC:

The asynchronous statistics update option, AUTO_UPDATE_STATISTICS_ASYNC, determines whether the query optimizer uses synchronous or asynchronous statistics updates. By default, the asynchronous statistics update option is off, and the query optimizer updates statistics synchronously. The AUTO_UPDATE_STATISTICS_ASYNC option applies to statistics objects created for indexes, single columns in query predicates, and statistics created with the CREATE STATISTICS statement.
Statistics updates can be either synchronous (the default) or asynchronous. With synchronous statistics updates, queries always compile and execute with up-to-date statistics; when statistics are out-of-date, the query optimizer waits for updated statistics before compiling and executing the query. With asynchronous statistics updates, queries compile with existing statistics even if the existing statistics are out-of-date; the query optimizer could choose a suboptimal query plan if statistics are out-of-date when the query compiles. Queries that compile after the asynchronous updates have completed will benefit from using the updated statistics.

No comments:

Post a Comment

SQL Server Architecture

SQL Server Architecture can be divided into two parts: Storage Engine & Relational Engine Relational engine three different compon...