Different approaches for paging SQL data

Pagination is a common technique to improve user experience while pulling large chunk of data from SQL. But even pulling paged data could become an overhead while retrieving data across multiple tables. Here I will go through some of the techniques for pagination.

For illustration I am assuming there is an "Activities" table that holds logs for all activities throughout an application. I will be retriving paged data from this table.

Approach 1: using ROW_NUMBER

This is the most common approach for getting paged data from SQL query and it works pretty well for simple queries. Here is an example:
DECLARE @StartIndex INT, @EndIndex INT;
SET @StartIndex = 51
SET @EndIndex = 100
 
SELECT * FROM 
(
 SELECT Count(1) OVER () AS TotalCount,
        ROW_NUMBER() OVER(ORDER BY CreatedDate DESC) AS RowNumber, 
        *
 FROM Activities
 -- add additional filters
) AllRecords 
WHERE RowNumber >= @StartIndex AND RowNumber <= @EndIndex


Approach 2: using temporary table

Suppose there are some associations of the activities with other entities in the system and I need to join some other tables and/or addtitional where clauses on demand. Now it could be relatively slow when the joined tables contain large number of records based on the complexity of the query. Using temporary table can be another option in this scenario:
DECLARE @StartIndex INT, @EndIndex INT;
SET @StartIndex = 51
SET @EndIndex = 100
 
SELECT  IDENTITY(INT,1,1) AS Id, 
        a.ActivityType, 
        u.Name AS CreatedBy,
        o.Name AS Organization, 
        i.Name AS Individual, 
        e.Name AS Employee        
 
INTO #temptable
 
FROM Activities a INNER JOIN
-- adding some complexity
(
 SELECT * FROM 
 (
  SELECT ActivityId, EntityType, EntityId  
  FROM ActivityAssociations    
 ) X
 PIVOT 
 (
  MAX(EntityId) 
  FOR EntityType IN([Organization],[Individual],[Employee])
 ) Y 
) Associations ON Associations.ActivityId=a.Id LEFT JOIN 
Organizations o ON Associations.Organization = o.Id LEFT JOIN
Individuals i ON Associations.Individual = i.Id LEFT JOIN
Employees e ON Associations.Employee = e.Id JOIN
Users u ON u.Guid = a.CreatedByGuid  
 
-- filters
WHERE a.ActivityType = 'Email' AND 
      a.CreatedByGuid = '03764AC4-011B-B41B-CA1D-EC80F9B5CCBA'
 
-- sorting
ORDER BY a.CreatedDate
 
SELECT * FROM #temptable
WHERE Id >= @StartIndex AND Id <= @EndIndex
 
SELECT @TotalCount = MAX(Id) FROM #temptable
 
DROP TABLE #temptable


Approach 3: using CTE

The temporary table creation part can be replaced using common table expression (CTE):
DECLARE @StartIndex INT, @EndIndex INT;
SET @StartIndex = 51
SET @EndIndex = 100
 
WITH Associations AS
(
 SELECT ROW_NUMBER() OVER(ORDER BY CreatedDate DESC) AS RowNumber, * 
 FROM Activities a INNER JOIN
 (
  SELECT ActivityId, EntityType, EntityId  
  FROM ActivityAssociations  
 ) x ON x.ActivityId = a.Id
 PIVOT 
 (
  MAX(EntityId) 
  FOR EntityType IN([Organization],[Individual],[Employee])
 ) y
 WHERE ActivityType = 'Email' AND 
       CreatedByGuid = '03764AC4-011B-B41B-CA1D-EC80F9B5CCBA'
)
SELECT  Count(1) OVER () AS TotalCount, 
        a.ActivityType, 
        u.Name as CreatedBy, 
        o.Name AS Organization, 
        i.Name AS Individual, 
        e.Name AS Employee
FROM Associations a LEFT JOIN 
 Organizations o ON a.Organization = o.Id LEFT JOIN
 Individuals i ON a.Individual = i.Id LEFT JOIN
 Employees e ON a.Employee = e.Id JOIN
 Users u ON u.Guid = a.CreatedByGuid  
WHERE   a.RowNumber >= @StartIndex AND a.RowNumber <= @EndIndex


Approach 4: paging with Fetch

This feature is available on SQL Server 2012 and usually performs faster than the previous techniques. Here is an example:
DECLARE @StartIndex INT, @MaxRecords INT;
SET @StartIndex = 51
SET @MaxRecords = 50
 
SELECT TotalCount = COUNT(*) OVER(), * FROM
(
 SELECT  y.Guid, 
                u.Name as CreatedBy, 
                y.CreatedDate, 
                o.Name AS Organization, 
                i.Name AS Individual, 
                e.Name AS Employee
 FROM Activities a INNER JOIN
 (
  SELECT ActivityId, EntityType, EntityId  
  FROM   ActivityAssociations  
 ) x ON  x.OwnerGuid = a.Guid
 PIVOT 
 (
  MAX(EntityId) 
                FOR EntityType IN([Organization],[Individual],[Employee])
 ) y LEFT JOIN 
 Organizations o ON y.Organization = o.Id LEFT JOIN
 Individuals i ON y.Individual = i.Id LEFT JOIN
 Employees e ON y.Employee = e.Id JOIN
 Users u ON u.Guid = y.CreatedByGuid 
 WHERE ActivityType = 'Email' AND 
       CreatedByGuid = '03764AC4-011B-B41B-CA1D-EC80F9B5CCBA'
) AllRecords
ORDER BY CreatedDate DESC
OFFSET (@StartIndex-1) ROWS
FETCH NEXT @MaxRecords ROWS ONLY


The ultimate approach for messy query

Sometimes the query along with its huge underlying data gets complicated enough to slower the performance and adding pagination techniques doesn't help that much. In that situation the best thing in my mind is to create a flat denormalized table containing all the information needed and to synchronize (eventually) the table from application/database when any relevant data get changed.

Also in certain cases we can use seperate simple COUNT() aggregate for getting total count, since getting the count using same query puts an overhead to its execution plan.

In conclusion, different approach can be used in diffrent scenarios. It's up to the developer to figure out the best possible technique for particular case.

Comments

Popular posts from this blog

Adding security headers to prevent XSS and MiTM attacks in .Net Core

Creating transformations for custom config files

Microsoft.IdentityModel.Protocols.OpenIdConnectProtocolInvalidNonceException