Total Pageviews

Thursday, June 13, 2013

Performance issues from ORDER BY/GROUP BY - spills in tempdb

It is very common and expected to see a query containing ORDER BY or GROUP BY clause for displaying or grouping purposes. It is also common that developers use ORDER BY clause from a habit without considering its necessity. As a result, queries become slower overtime as the number of records increases.


The Last Supper - by Leonardo da Vinci
Grouping example in Art 
When a sort operation unable to acquire sufficient memory grant and it cannot be done in the memory and must happen in the tempdb. The heavier processing load inside the tempdb degrades overall SQL Server performance significantly. This situation is usually known as “spill to tempdb” or “spills in tempdb”. It is crucial to identify those sort warnings and avoid them whenever possible.

In my experience, I have seen ORDER BY/GROUP BY being used on a VARCHAR (8000) column while retrieving data; even unwisely used on a JOIN clause! Tweaking these queries is a bit tricky and most of the time it is impossible since the front-end application or business logic has already been built-in on this criterion. Creating an index on this column is not possible due to the 900 bytes restriction on an index key column.  So, other than crossing fingers, there is nothing much to do to resolve the performance issue immediately.  

Common Issues:
Following are some common issues that occur due to the misuse of ORDER BY/GROUP BY clause:
1.      Rapid tempdb data file growth.
2.      Increases disk I/O activities on tempdb and tempdb drive.
3.      Introduces lock contention and escalation.
4.      Increases memory grant for sort/hash operation.
5.      Introduces parallel query plan.

Detecting the Issue:
Detecting performance issues that arise from sort operation is quite simple and straight forward. Following are some tips to identify issues:
1.      Review the query and identify columns that are used on ORDER/GROUP clauses.
2.      Review the Execution plan and identify “sort” operators.
3.      Identify parallelism operators that perform the “distribute streams”, ”gather streams” and “repartition streams” in parallel execution plan.
4.      Use SQL Profiler Trace event “sort warnings”.
5.      Extended Event – “sort_warning”
6.      Use PerfMom or sys.dm_os_performance to track “worktables created/sec” and “workfiles created/sec”

To resolve the performance Issue:
To resolve performance issues that occur from a sort operation, a couple of actions can be taken as follows:
1.        Review the necessity of a sort operation in the query.
2.        Try to perform a sort operation in the front-end.
3.        Normalize the database schema.
4.        Create single or multi-column indexes.
5.        Apply filters on indexes.
6.        Use TOP (n) when there is an “ORDER BY”, if possible.
7.        Put more filters in the query to touch less data.
8.        Update distribution statistics.

Observing the behavior:
To observe the common issues with ORDER BY/GROUP BY operations, let’s create a database, table and a simple select statement against 500,000 records.

CREATE DATABASE testDB
GO
USE testDB
GO

SET NOCOUNT ON

IF OBJECT_ID('tblLarge') IS NOT NULL
    DROP TABLE tblLarge
GO

CREATE TABLE tblLarge
    (
      xID INT IDENTITY(1, 1) ,
      sName1 VARCHAR(100) ,
      sName2 VARCHAR(1000) ,
      sName3 VARCHAR(400) ,
      sIdentifier CHAR(100) ,
      dDOB DATETIME NULL ,
      nWage NUMERIC(20, 2) ,
      sLicense VARCHAR(25)
    )
GO

/*********************************
Add 500000 records
**********************************/

SET NOCOUNT ON
INSERT  INTO tblLarge
        ( sName1 ,
          sName2 ,
          sName3 ,
          sIdentifier ,
          dDOB ,
          nWage ,
          sLicense
        )
VALUES  ( LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 50) ,    -- sName1
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 60) ,    -- sName2
          LEFT(CAST(NEWID() AS VARCHAR(36)), RAND() * 70) ,    -- sName2
          LEFT(CAST(NEWID() AS VARCHAR(36)), 2) ,              -- sIdentifier    
          DATEADD(dd, -RAND() * 20000, GETDATE()) ,            -- dDOB
          ( RAND() * 1000 ) ,                                  -- nWage
          SUBSTRING(CAST(NEWID() AS VARCHAR(36)), 6, 7)        -- sLicense       
        )
GO 500000

/******************************************************
** Create a clustered index
******************************************************/
ALTER TABLE [tblLarge]
       ADD  CONSTRAINT [PK_tblLarge]
       PRIMARY KEY CLUSTERED ([xID] ASC)

/***************************************************************
** To resolve the sort warning, create a non-clustered index
***************************************************************/
CREATE NONCLUSTERED INDEX [IX_sName1]
       ON [tblLarge] ([sName1] ASC)


Simple SELECT Statement:
Following are some simple select statements to reproduce the behavior.

/******************************************************
** Simple select statement
******************************************************/
--First query
SELECT  xID ,
        sName1
FROM    tblLarge

-- Second query with - ORDER BY
SELECT  xID ,
        sName1
FROM    tblLarge
ORDER BY sName1

-- Third query - GROUP BY/ORDER BY
SELECT  sName1 ,
        COUNT(sName1) AS nCount
FROM    tblLarge a
GROUP BY sName1
ORDER BY sName1

Using Extended Events (SQL 2012), SQL Profiler Trace and Execution Plan, “sort warning” are easily detectable and following are some outputs.

Figure#1: sort warning using Extended Events in SQL 2012



Figure#2A: sort warning detection using Execution Plan


Figure#2B: sort warning detection using Execution Plan

  
Figure#2: sort warning detection using SQL Profiler Trace


Learn More:

3 comments:

  1. Excellent stuff..no blog has talked so far

    ReplyDelete
  2. Very Good...and best explanation with simple examples....

    ReplyDelete