Sunday, August 8, 2010

Locking in DB2

Locking and concurrency issues can have a significant impact on the performance of a DB2 application. A good understanding of how the various serialization mechanisms work in DB2 will help application developers to take informed design decisions. Happy Reading!

Locking & Lock Management

Locking and lock management are a must in any multithread environment that uses a shared, updateable database, so that all users access the same version at any one time. Without proper locking, application processes that update would have to be single-threaded, thereby reducing throughput and increasing response time. Running batch processes concurrently with online processes would be nearly impossible.

Locking is required to allow inter-process concurrency and avoid data integrity problems. Locking is therefore a fundamental and implicit system service provided by a database management system (DBMS). Two or more independent application processes must be prevented from simultaneously accessing and updating specific data occurrences, whether the occurrences be called rows as in DB2, segments as in information management system (IMS), or logical records as for the virtual storage access method (VSAM).

Locking is a partnership between the DBMS which provides concurrency and consistency and the application that drives granularity and locking. Everything that the DBMS can “guarantee”, the application can “undo” with a combination of caching data across commits, failure to check return codes, or inadequate and incorrect restart logic.

Locking Modes

DB2 takes locks on table spaces, tables (if table space is segmented), and pages or rows. The three locking modes are:

Share – The lock owner and any concurrent process can read, but not change, the locked DB2 object. Other concurrent processes may acquire Share or Update locks on the DB2 object. Also called S-lock.

Update – The lock owner can read the DB2 object and intends to change it. Concurrent processes may acquire Share locks and read the DB2 object but no other process can acquire an Update lock. Update locks must be promoted to Exclusive locks before DB2 actually changes the DB2 object. Also called U-lock.

Exclusive – Only the lock owner can read or change the locked data, with exceptions like concurrent applications using uncommitted read isolation. Also called X-lock.
 
Basic reasons for Locking

Reason 1 - Losing updated data

Without concurrency control, two processes, A and B, might both read the same row from the database, and both calculate new values for one of its columns, based on what they read. If A updates the row with its new value, and then B updates the same row, A's update is lost.

With locking, if the application uses updatable cursors (i.e., with ‘for update of’ clause), DB2 grants a U-lock to user A. User B requests for U-lock and will wait till user A releases the lock (either commits or rolls back). If updatable cursors are not used, then DB2 grants an S-lock to User A and consequently grants an S-lock to User B. When User A and User B try to update the column value, they get into a deadlock situation. When a deadlock exists, DB2 decides whether User A or User B should be rolled back.

Reason 2 - Reading uncommitted data

Also without concurrency control, process A might update a value in the database, and process B might read that value before it was committed. Then, if A's value is not later committed, but backed out, B's calculations are based on uncommitted (and presumably incorrect) data.

With locking, DB2 grants a X-lock to user A. User B tries to get a S-lock and will have to wait till user A commits or rolls back the change.

Reason 3 - Unrepeatable read within a unit of work

User A wants to read the same data twice. It is essential that another user has not updated the data between the two read processes.

Let us look at the following example:

SELECT * FROM EMP

WHERE SALARY >

(SELECT AVG(SALARY) FROM EMP)

This SQL statement reads the EMP (employee) table twice: once to find the average salary and next to find all rows with salary greater than the average. Without any locking between the two read processes, another user can update the EMP table between the two read processes. This update can lead to a wrong result for user A.

With DB2 locking, the following are the options the application developer can choose from depending on the need for concurrency:

1. Use the package or plan isolation level repeatable read (RR), or use the WITH RR clause in the SQL SELECT statement.

2. Lock the table in share or exclusive mode explicitly, although exclusive mode is an overkill in this particular case.

3. Mark the table space Read-only. Option is not recommended.

Effects of Locking

Lock Suspension

An application process is suspended when it requests a lock that is already held by another application process and cannot be shared. The suspended process temporarily stops running. For example, using an application for inventory control, two users attempt to reduce the quantity on hand of the same item at the same time. The two lock requests are queued. The second request in the queue is suspended and waits until the first request releases its lock. Promotion to Update lock to Exclusive lock may cause a suspension if other processes are holding Share locks.

The suspended process resumes running when:

All processes that hold the conflicting lock release it.

The requesting process times out or deadlocks and the process resumes to deal with an error condition.

Timeout

An application process is said to time out when it is terminated because it has been suspended for longer than a preset interval. COMMIT and ROLLBACK operations do not time out.

Deadlock

A deadlock occurs when two or more application processes each hold locks on resources that the others need and without which they cannot proceed.

Jobs EMPLJCHG and PROJNCHG are two transactions.

1.Job EMPLJCHG requests a lock for page A of table N while holding the lock on page B of table M. The job is suspended, because job PROJNCHG is holding an exclusive lock on page A.

2.Job PROJNCHG requests a lock for page B of table M while holding the lock on page A of table N. The job is suspended, because job EMPLJCHG is holding an exclusive lock on page B.

The situation is a deadlock. DB2 scans for deadlocked processes at regular intervals according to the value set as DEADLOCK TIME during installation. DB2 can roll back the current unit of work for one of the processes or request a process to terminate. That frees the locks and allows the remaining processes to continue.

It is possible for two processes to be running on separate DB2 subsystems, each trying to access a resource at the other location. In that case, neither subsystem can detect that the two processes are in deadlock; the situation is resolved only when one process times out.

Lock sizes

For DB2 objects, we have the LOCKSIZE option on the CREATE TABLESPACE and ALTER TABLESPACE statements. The choices are ANY, ROW, PAGE, TABLE, or TABLESPACE, with ANY as the default.

ANY permits DB2 to make the final choice and DB2 favors page locking as a good compromise between high concurrency and high CPU consumption. Most often, DB2 implicitly locks at the page level, but certain SQL DML statements may be such that DB2 has the option to use table or table-space level or even row locking as an alternative.

PAGE is a good design default to allow for interprocess concurrency with read and write intent against a table.

Row locking can be considered to improve concurrency still further (reduce suspensions, timeouts) but it may increase locking cost for sequential update processing and introduce more opportunities for deadlocks.

The TABLE or TABLESPACE option should rarely be used in an online environment that emphasizes shared data. The explicit SQL statement LOCK TABLE is another option available. TABLE or TABLESPACE options could be useful where the data is read-only; a single user or an update process requires exclusive control over the table. Taking one lock on the entire table or table space saves the CPU overhead of locking individual pages as they are accessed. The more pages accessed by the process, the greater the savings.

Duration of Lock


The duration of a lock is defined as the length of time a lock is held and varies according to the type of lock.

If a page or row is locked, DB2 acquires the lock only when it is needed (i.e., when individual pages or rows are accessed). When the lock is released depends on whether the data is accessed for read-only or for update and on the isolation level specified, but it is rarely held beyond the next commit point.

Partition, table, and table space locks can be acquired when a plan is first allocated, or you can delay acquiring them until the resource they lock is first used. They can be released at the next commit point or be held until the program terminates.

Effects of Lock

Effects of locking depend upon the size and duration of the locks. For maximum concurrency, locks on a small amount of data held for a short duration are better than locks on a large amount of data held for a long duration. However, acquiring a lock requires processor time, and holding a lock requires storage; thus, acquiring and holding one table space lock is more economical than acquiring and holding many page locks. So it is essentially a trade-off between performance and concurrency.

Lock Escalation

Lock escalation is the releasing of all the page or row locks that a process holds on a single table or table space in order to acquire a table or table space lock instead. Lock escalation balances concurrency with performance by using page or row locks while a process accesses relatively few pages or rows, then changing to table space or table locks when the process accesses many.

LOCKMAX is an option on the CREATE TABLESPACE and ALTER TABLESPACE statements. It defines the maximum number of page or row locks an application process can hold simultaneously in the table space. If a program requests more than that number, locks are escalated.

Objects of Lock

The object of a lock is the resource being locked.
User data in target tables - A target table is a table that is accessed specifically in an SQL statement, either by name or through a view. Locks on those tables are the most common concern, and the ones over which you have most control.

User data in related tables - Operations subject to referential constraints can require locks on related tables. For example, if you delete from a parent table, DB2 might delete rows from the dependent table as well. In that case, DB2 locks data in the dependent table as well as in the parent table. If the application uses triggers, any triggered SQL statements can cause additional locks to be acquired.

DB2 internal objects – Locks on the following internal tables: Portions of the DB2 catalog, the skeleton cursor table (SKCT) representing an application plan, the skeleton package table (SKPT) representing a package and the database descriptor (DBD) representing a DB2 database.

Indexes and data-only locking

DB2 Version 4 introduced a new type of index (type 2 index) that eliminate locks on index pages so that your applications can avoid deadlock and timeout problems on the index. No index page locks are acquired during processing. Instead, DB2 uses a technique called data-only locking to serialize changes. Index page latches are acquired to serialize changes within a page and guarantee that the page is physically consistent. Acquiring page latches ensures that transactions accessing the same index page concurrently do not see the page in a partially changed state.

The data locks can be held for a long duration such as until commit. However, the page latches are only held for a short duration while the transaction is accessing the page. Because the index pages are not locked, hot spot insert scenarios (which involve several transactions trying to insert different entries into the same index page at the same time) do not cause contention problems in the index.

A query that uses index-only access might lock the data page or row, and that lock can contend with other processes that lock the data. However, using lock avoidance techniques can reduce the contention.

Lock Avoidance

Lock avoidance is a mechanism used under certain circumstances by DB2 to increase concurrency, at the same time avoiding calls to IRLM to request a page or row lock.

In a lock-avoidance situation, DB2 reads a page without first taking an IRLM lock; hence, the page might contain uncommitted data. To prevent retrieval of uncommitted data, DB2 uses a combination of mechanisms:

Page latching controlled by DB2 is a very efficient way to ensure physical consistency of the page, just as locks ensure that data is committed.

Commit Log Sequence Number (CLSN) to check for the committed state at a page level - The page oriented CLSN technique takes advantage of the page log relative byte address (last change RBA) in the header of every page. If the page log relative byte address value of the page is less than the CLSN for the page set, all of the rows in the page are in a committed state and no lock is required to ensure the logical consistency of the data in the page.

Possibly UNCommitted (PUNC) a deeper checking at a row level, done only if the CLSN check fails – Row-oriented PUNC technique checks the PUNC mark that indicates if an individual row in a page is committed.


Users have no direct control over the use of lock avoidance. Read-only SQL can use lock avoidance techniques to avoid locking. Other lock avoidance or reduction techniques are use of CURRENTDATA(NO), ACQUIRE(USE), RELEASE(DEALLOCATE), isolation, lock size and frequent commits.

No comments:

Post a Comment