Deadlock due to Lock Partitioning

In a previous post we have discussed how to analyze deadlock involving lock partitions. In this post we are going to look  into a rare deadlock condition that arises only when lock partitioning is enabled. The same script will only cause blocking on a server where Lock Partitioning is not enabled.

The deadlock I am going to demonstrate below is reproducible in SQL server 2008 R2 (10.50.1600.1) and SQL server 2012 (11.0.2100.60)

Setup

Make sure lock partitioning is enabled.  Run the below script to create the necessary table. You can skip this step if you have created the table from the previous post.

IF DB_ID(‘DeadLockTest’) IS NULL
    CREATE Database DeadLockTest
GO

USE DeadLockTest
GO

CREATE TABLE dbo.lpdl(id int identity(1,1) primary key clustered, filler char(100) default ‘abc’)
GO
SET NOCOUNT ON
INSERT INTO dbo.lpdl DEFAULT VALUES
GO 100

Open a new session (Session 1) and run the below script to create a head blocker thread.

 

BEGIN TRAN

UPDATE dbo.lpdl
SET filler = ‘xyz’
WHERE id = 1

 

–ROLLBACK TRAN

Open another session (Session 2) and run the below script.

BEGIN TRAN

UPDATE dbo.lpdl
SET filler = ‘xyz’
WHERE id = 2

SELECT * FROM dbo.lpdl WITH(TABLOCKX)

ROLLBACK TRAN

The update statement in the above script will succeed and the SELECT will be blocked due to the (TABLOCKX) hint.

Now open one more session (Session 3) and run the below script.

BEGIN TRAN

UPDATE dbo.lpdl
SET filler = ‘xyz’
WHERE id = 3

SELECT * FROM dbo.lpdl WITH(TABLOCKX)

ROLLBACK TRAN

The update statement in the above script will probably succeed and the SELECT will be blocked due to the (TABLOCKX) hint. The deadlock can be simulated only when the IX lock request of Session 3 succeed and it is waiting for X Lock on all partitions. The probability of the IX lock request being succeeded is solely dependent on the local lock partition assigned to the thread.  Query the sys.dm_tran_locks to see the lock partitions assigned to each thread.

SELECT request_session_id, resource_lock_partition,request_mode,request_status
FROM sys.dm_tran_locks
WHERE resource_type = ‘OBJECT’
AND request_mode = ‘IX’

My results looks like the following.

request_session_id resource_lock_partition request_mode request_status
51 4 IX GRANT
55 9 IX GRANT
56 10 IX GRANT

 

If the request_status for any of the session is WAIT, start over by switching the order of queries or by starting new connections.

If you see all IX locks in GRANT state as above, Check the X lock requests also. Session2 will be waiting for an X lock on the local partition of Session 1and Session 3 will be waiting for the X lock on partition 0 since all distributed acquire must start on partition 0.

At this point, go back to Session1 and run the ROLLBACK TRAN command to remove the blocking. Now session 3 will fail with the deadlock error.

Analyze

Lets build the deadlock analysis table from the trace data to see what is happening here.

Process Resource Mode State Command
process4eb4748 OBJECT:26:242099903:0 X WAIT SELECT
process4eb4748 OBJECT: 26:242099903:10 IX GRANT UPDATE
process4c9a153 OBJECT: 26:242099903:10 X WAIT SELECT
process4c9a153 OBJECT:26:242099903:0 X GRANT SELECT

 

This tells us that process process4eb4748 (Session 3) is holding an IX lock on lock partition 10 and it is waiting for an X lock on lock partition 0. Remember that X lock needs to be acquired on all lock partitions starting from lock partition 0.

Meanwhile, process process4c9a153 is holding an X lock on lock partition 0 (It is also having X lock on all lock partitions from 0 to 9) and is waiting for an X lock on lock partition 10. Since both the processes are holding and waiting for non-compatible locks on each other, a deadlock ensues.

This deadlock will never happen on a server where lock partitioning is not enabled because, due to the FIFO model of Lock Manager, the first UPDATE statement of the third session will always be blocked by the second session.

Fix

This is an extremely rare scenario. The key to avoid these kind of issues is avoiding Object level locks. Don’t start looking for a trace flag to disable lock partitioning (which is btw –T1229). Lock partitioning is an extremely useful feature for reducing lock contention especially on Sch-M, SCh-S, IS and IX locks. Remember that all queries requires Sch-S/Sch-M locks not just on the tables but also on several internal metadata structures.

Note: Apparently there is a bug in the Lock partitioning implementation which causes a distributed acquire for Sch-S locks which can result in deadlock condition. This issue is fixed in SQL 2012.

UPDATE : Robert Dorr discusses the above mentioned bug in his blog post titled “Strange Sch-S / Sch-M Deadlock on Machines with 16 or More Schedulers

This post is part of the series SQL Server deadlocks and live locks – Common Patterns. If you have questions, please leave a comment here or reach me at @sqlindian

Advertisements

Deadlocks Involving Lock Partitions

Lock Partitioning is a feature in SQL Server – which is automatically enabled for machines with more than 16 logical processors – aimed at reducing lock contention. To quote from the MSDN link on the topic,

For large computer systems, locks on frequently referenced objects can become a performance bottleneck as acquiring and releasing locks place contention on internal locking resources. Lock partitioning enhances locking performance by splitting a single lock resource into multiple lock resources. This feature is only available for systems with 16 or more CPUs, and is automatically enabled and cannot be disabled. Only object locks can be partitioned.

When lock portioning is enabled NL, SCH-S, IS, IU, and IX lock modes are acquired on a single partition. All other lock modes must be acquired on all partitions.

One key thing to notice is that, when lock partitioning is enabled, Objects locks are acquired on lock partitions,not the object itself. For eg. If you run the below query on a server where lock partitioning is not enabled, you will get the below result. (irrelevant results omitted)

USE AdventureWorks
GO

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRAN

SELECT * FROM Person.[Address]
WHERE AddressId = 1

SELECT * FROM Person.ContactType WITH(TABLOCKX)

SELECT OBJECT_NAME(resource_associated_entity_id) as objectName,resource_description, resource_lock_partition,request_mode  FROM sys.dm_tran_locks
WHERE resource_type = ‘OBJECT’

ROLLBACK TRAN

ObjectName resource_lock_partition request_mode
Address 0 IS
ContactType 0 X

If you run the same script on a machine on which lock partitioning is enabled, you will get the following results.

ObjectName resource_lock_partition request_mode
Address 5 IS
ContactType 15 X
ContactType 14 X
ContactType 13 X
ContactType 12 X
ContactType 11 X
ContactType 10 X
ContactType 9 X
ContactType 8 X
ContactType 7 X
ContactType 6 X
ContactType 5 X
ContactType 4 X
ContactType 3 X
ContactType 2 X
ContactType 1 X
ContactType 0 X

You can see that the IS lock is acquired only on one partition, whereas the X lock is acquired on all partitions. The actual number of partitions depend upon the number of available logical processors.

As the locking is done at the lock partition level, naturally these locks can participate in a deadlock and the resources listed under the resource list will be the lock partitions.

A deadlock can be identified as a lock portioning related deadlock from the below attributes.

1. The wait resource is “Object”

2. The wait resource identifier will be in the format “7:1797581442:12” where the last two digit indicates the lock partition.

3. In the resource list the value for sub resource is “FULL”

4. The resource list will be in the format “lockPartition=12”

5. If you execute the statement involved in the deadlock in a query and inspect resource_lock_partition column in sys.dm_tran_locks, you will see a non-zero value.

The following demo will illustrate a deadlock on lock partitions.

Setup

SQL server automatically enables lock partitioning on machines with more than 16 logical processors. If you don’t have a machine with 16 cores, you can use the (undocumented and unsupported) startup parameter –Pn.

I’ve used –P16 to simulate 16 processors.

When lock partitioning is enabled, you will see the following message in the SQL server log.

Lock partitioning is enabled.  This is an informational message only. No user action is required.

Once you verify that lock partitioning is enabled, run the below script to create the table for our test.

IF DB_ID(‘DeadLockTest’) IS NULL
CREATE Database DeadLockTest
GO

USE DeadLockTest
GO

CREATE TABLE dbo.lpdl(id int identity(1,1) primary key clustered, filler char(100) default ‘abc’)
GO
SET NOCOUNT ON
INSERT INTO dbo.lpdl DEFAULT VALUES
GO 100

Now open a new session (session 1) and run the below query.

BEGIN TRAN

UPDATE dbo.lpdl
SET filler = ‘xyz’
WHERE id = 1

WAITFOR DELAY ’00:00:05′

SELECT * FROM dbo.lpdl WITH(TABLOCKX)

ROLLBACK TRAN

Open another session (Session 2) and run the below query.

BEGIN TRAN

UPDATE dbo.lpdl
SET filler = ‘xyz’
WHERE id = 1

WAITFOR DELAY ’00:00:05′

SELECT * FROM dbo.lpdl WITH(TABLOCKX)

ROLLBACK TRAN

One of the sessions will fail with the deadlock error. You have probably noticed that we are forcing a conversion deadlock here by first acquiring a compatible IX lock and the trying to acquire a non-compatible X lock on the same table.

Analyze

To analyze this deadlock we will go ahead and create our deadlock table.

Process Resource Mode State Command
process4ee8508 OBJECT: 26:242099903:2 X WAIT SELECT
process4ee8508 OBJECT: 26:242099903:0 X GRANT SELECT
process7c4988 OBJECT: 26:242099903:0 X WAIT SELECT
process7c4988 OBJECT: 26:242099903:2 IX GRANT UPDATE

The lock resource reported in the deadlock trace will be in the format OBJECT:DBID:OBJECTID:LOCKPARTITION. AS we can see here  process process7c4988 is holding an IX lock on partition 2. Since IX lock is acquired only on the local partition, we can assume that partition 2 is local to  process7c4988. This process now requires an X lock on partition 0. When a process requires any locks other than locks that can be acquired on the local partition, it must access the locks in a sequential order starting from partition 0. But since process process4ee8508 is already holding an X lock on partition 0, process process7c4988 is now blocked.

Meanwhile process4ee8508 is holding an X lock (converted its own IX lock on the partition to X lock) on partition 0 and it needs to acquire the X lock on the remaining 15 partitions in that order. When it reaches partition 2, it will be blocked since the other thread is holding an incompatible lock and a deadlock ensues.

Regardless of whether lock portioning was enabled or not, the above script will cause a deadlock.  But there are extremely rare scenarios where the lock partitioning feature will cause a deadlock whereas the normal locking behavior will only cause blocking. I will talk about that in the next post.

Fix

The key to avoid this kind of deadlock is to avoid object level locks. These kind of deadlocks are pretty rare. I’ve only found one reference of deadlock involving lock partitions in MSDN forums.

This post is part of the series SQL Server deadlocks and live locks – Common Patterns. If you have questions, please post a comment here or reach me at @sqlindian