Members Bookmarks Fresher Jobs Funny Photos B.Tech Projects New Member FAQ  



My Profile
Active Members
TodayLast 7 Days more...



Awards & Gifts
Online Exams

Fresher Jobs


Our fresher job section is exclusively for fresh graduates! Find jobs for freshers in major Indian cities including Bangalore, Chennai, Hyderabad, Pune or Kochi

Resources


Find educational articles, blogs, discussion threads and other resources.

Colleges


Find details about any college in India or search for courses.

Paid Surveys


website counter



TRANSACTION MANAGEMENT in DBMS


Posted Date: 08 May 2008    Resource Type: Articles/Knowledge Sharing    Category: Computer & Technology

Posted By: durga       Member Level: Silver
Rating:     Points: 4



Transaction
• A sequence of many actions which are considered to be one atomic unit of work.
– Read, write, commit, abort
• Governed by four ACID properties:
– Atomicity, Consistency, Isolation, Durability
• Has a unique starting point, some actions and one end point
• DBMS “actions”:
– reads, writes
– Special actions: commit, abort
– For now, assume reads and writes are on tuples;
• A transaction is a unit of work which completes as a unit or fails as a unit.
The ACID Properties
• Atomicity: All actions in the transaction happen, or none happen.
• Consistency: If each transaction is consistent, and the DB starts consistent, it ends up consistent.
• Isolation: Execution of one transaction is isolated from that of other transactions.
• Durability: If a transaction commits, its effects persist.
• A transaction is a collection of operations involving data items in a database. Read, insert, delete, and update are example operations. There are four important properties of transactions that a DBMS must ensure to maintain data in the face of concurrent access and system failures:
– Atomicity: Users should be able to regard the execution of each transaction as atomic: either all actions are executed or none are executed. Users should not have to worry about the effect of incomplete transactions (example, when a system crash occurs).
– Consistency: Each transaction, run by itself with no concurrent execution of other transactions, must preserve the consistency of the database. This property is called consistency. Ensuring this property of the transaction is the responsibility of the user.
– Isolation: Users should be able to understand the transaction without considering the effect of other concurrently executing transactions, even if the DBMS interleaves the actions of several transactions. Transactions are isolated, or protected, from the effects of concurrently scheduling other transactions.
– Durability: Once the DBMS informs the user that a transaction has been successfully completed, its effects should persist even if the system crashes before all its changes are reflected on disk.
• Passing the ACID Test
Concurrency Control – Guarantees Consistency and Isolation, given Atomicity.
Logging and Recovery – Guarantees Atomicity and Durability.
• Atomicity
All-or-nothing, no partial results. An event either happens and is committed or fails and is rolled back.
– e.g. in a money transfer, debit one account, credit the other. Either both debiting and crediting operations succeed, or neither of them do.
– Transaction failure is called Abort
• Commit and abort are irrevocable actions. There is no undo for these actions.
• An Abort undoes operations that have already been executed
– For database operations, restore the data’s previous value from before the transaction (Rollback-it); a Rollback command will undo all actions taken since the last commit for that user.
– But some real world operations are not undoable.
Examples - transfer money, print ticket, fire missile
• Consistency
• Every transaction should maintain DB consistency
– Referential integrity - e.g. each order references an existing customer number and existing part numbers
– The books balance (debits = credits, assets = liabilities)
• Consistency preservation is a property of a transaction, not of the database mechanisms for controlling it (unlike the A, I, and D of ACID)
• If each transaction maintains consistency, then a serial execution of transactions does also
• A database state consists of the complete set of data values in the database
• A database state is consistent if the database obeys all the integrity constraint
• A transaction brings the database from one consistent state to another consistent state
Consistent state diagram
• Isolation
Intuitively, the effect of a set of transactions should be the same as if they ran independently.
• Formally, an interleaved execution of transactions is serializable if its effect is equivalent to a serial one.
• Implies a user view where the system runs each user’s transaction stand-alone.
• Of course, transactions in fact run with lots of concurrency, to use device parallelism – this will be covered later.
• Transactions can use common data (shared data)
• They can use the same data processing mechanisms (time sharing)
Durability
• When a transaction commits, its results will survive failures (e.g. of the application, OS, DB system … even of the disk).
• Makes it possible for a transaction to be a legal contract.
• Implementation is usually via a log
– DB system writes all transaction updates to a log file
– to commit, it adds a record “commit(Ti)” to the log
– When the commit record is on disk, the transaction is committed.
– system waits for disk ack before acknowledging to user
• Active: transaction is started and is issuing reads and writes to the database
• Partially committed: operations are done and values are ready to be written to the database
• Committed: writing to the database is permitted and successfully completed
• Abort: the transaction or the system detects a fatal error
• Terminated: transaction leaves the system
• A transaction reaches its commit point when all operations accessing the database are completed and the result has been recorded in the log. It then writes a [commit, ] and terminates
• When a system failure occurs, search the log file for entries [start, ]
and if there are no logged entries [commit, ]then undo all operations that have logged entries [write, , X, old_value, new_value]
Why is Concurrency Control Needed?
• Several problems occur when concurrent transactions execute in an uncontrolled manner
• A schedule of concurrent transactions is a particular sequence of interleaving of their read or write operations
• In general a transaction, has a set of data items it accesses (read set), and a set of data items it modifies (write set)
Lost Update Problem - Successfully completed update is overridden by another user.
• What should the final Order Value be?
• Which Update has been lost?
• Loss of T2’s update avoided by preventing T1 from reading balx until after update.
Uncommitted Dependency Problem / Temporary Update Problem
• Occurs when one transaction can see intermediate results of another transaction before it has committed.
• What should the final Order Value be?
• Where is the temporary update?
Uncommitted Dependency Problem
• Problem avoided by preventing T3 from reading balx until after T4 commits or aborts.
Incorrect Summary Problem
• What should the total Order Value be?
• Which order was accumulated before update, and which after?
• Schedules - actions of transactions as seen by the DBMS
• Schedule: An interleaving of actions from a set of transactions, where the actions of any one transaction are in the original order.
• When two or more transactions are running concurrently, the steps of the transactions would normally be interleaved. The interleaved execution of transactions is decided by the database scheduler which receives a stream of user requests that arise from the active transactions. A particular sequencing (usually interleaved) of the actions of a set of transactions is called a schedule.
– Represents some actual sequence of database actions.
– Example: R 1 (A), W 1 (A), R 2 (B), W 2 (B), R 1 (C), W 1 (C)
T1 T2
R(A)
W(A)
R(B)
W(B)
R(C)
W(C)
– In a complete schedule, each transaction ends in commit or abort.
Initial State + Schedule ® Final State
• Serial schedule is a schedule in which all the operations of one transaction are completed before another transaction can begin (that is, there is no interleaving).
– One sensible “isolated, consistent” schedule:
– Run transactions one at a time, in a series.
NOTE: Different serial schedules can have different final states; all are “OK”
-- DBMS makes no guarantees about the order in which concurrently submitted transactions are executed.
• Serializable Schedule
• A schedule whose effect on the DB “state” is the same as that of some serial schedule
– All serial schedules are serializable. But the reverse may not be true
• Let T be a set of n transactions . If the n transactions are executed serially (call this execution S), we assume they terminate properly and leave the database in a consistent state. A concurrent execution of the n transactions in T (call this execution C) is called serializable if the execution is computationally equivalent to a serial execution. There may be more than one such serial execution. That is, the concurrent execution C always produces exactly the same effect on the database as some serial execution S does. (Note that S is some serial execution of T, not necessarily the order). A serial schedule is always correct since we assume transactions do not depend on each other and furthermore, we assume, that each transaction when run in isolation transforms a consistent database into a new consistent state and therefore a set of transactions executed one at a time (i.e. serially) must also be correct.
– Final state is what some serial schedule would have produced.
– Aborted Xacts are not part of schedule; ignore them for now (they are made to `disappear’ by using logging).
In general, serializability is a good property – But difficult to achieve due to lack of effective algorithms.
• Serializability
• Objective of a concurrency control protocol is to schedule transactions in such a way as to avoid any interference.
• Could run transactions serially, but this limits degree of concurrency or parallelism in system.
• Serializability identifies those executions of transactions guaranteed to ensure consistency.
Schedule
Sequence of reads/writes by set of concurrent transactions.
Serial Schedule
Schedule where operations of each transaction are executed consecutively without any interleaved operations from other transactions.
Nonserial Schedule
• Schedule where operations from set of concurrent transactions are interleaved.
• Objective of serializability is to find nonserial schedules that allow transactions to execute concurrently without interfering with one another.
• In other words, want to find nonserial schedules that are equivalent to some serial schedule. Such a schedule is called serializable.
In serializability, ordering of read/writes is important:
(a) If two transactions only read a data item, they do not conflict and order is not important.
(b) If two transactions either read or write completely separate data items, they do not conflict and order is not important.
(c) If one transaction writes a data item and another reads or writes same data item, order of execution is important.
• Serializability Violations
• Two transactions T1 and T2 are said to conflict if some action t1 of T1 and an action t2 of T2 access the same object and at least one of the actions is a write. The conflict is called a RW-conflict if the write set of one transaction intersects with the read set of another. A WW-conflict occurs if the conflict is between two writes.
– Result is not equal to any serial execution!
– W- R conflict: T2 reads something T1 wrote previously (dirty read).
RW Conflicts (Unrepeatable Read)
– T2 overwrites what T1 read.
– If T1 reads it again, it will see something new!
Example when this would happen? The increment/ decrement example.
– Again, not equivalent to a serial execution.
WW Conflicts (Overwriting Uncommitted Data)
– T2 overwrites what T1 wrote. Example: 2 transactions to update items to be kept equal.
– Usually occurs in conjunction with other anomalies.
Aborted Transactions: All actions of aborted transactions are to be undone
– as if aborted transactions never happened.
Two Issues:
– How does one undo the effects of a transaction? We’ll cover this in logging/ recovery
– What if another transaction sees these effects?? Must undo that transaction as well!
• Cascading Aborts
T1 T2
R(A)
W(A)
R(A)
W(A)
abort
Abort of T1 requires abort of T2! – Cascading Abort
• What about WW conflicts & aborts?
– T2 overwrites a value that T1 writes.
– T1 aborts: its “remembered” values are restored.
– Lose T2’s write! We will see how to solve this, too.
• Recoverable Schedule
A schedule where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then the commit operation of Ti precedes the commit operation of Tj.
Unrecoverable Schedule
T1 T2
R(A)
W(A)
R(A)
W(A)
commit
abort
Recoverable Schedule
T1 T2
R(A)
W(A)
R(A)
W(A)
commit
commit
Abort of T1 in first figure requires abort of T2!
– But T2 has already committed and hence cannot undo its actions! This is unrecoverable schedule. A recoverable schedule is one in which this cannot happen.
– i. e. a transaction commits only after all the transactions it “depends on” (i. e. it reads from or overwrites) commit.
An ACA (avoids cascading abort) schedule is one in which cascading abort cannot arise.
– A transaction only reads/ writes data from committed transactions.
– Recoverable implies ACA (but not vice- versa!). Real systems typically ensure that only recoverable schedules arise (through locking).
• Concurrency Control Techniques
• Two basic concurrency control techniques:
– Locking,
– Timestamping.
• Both are conservative approaches: delay transactions in case they conflict with other transactions. Optimistic methods assume conflict is rare and only check for conflicts at commit.
• Locking
The concept of locking data items is one of the main techniques for controlling the concurrent execution of transactions.
• A lock is a variable associated with a data item in the database.
– Generally there is a lock for each data item in the database.
• A lock describes the status of the data item with respect to possible operations that can be applied to that item
– used for synchronising the access by concurrent transactions to the database items.
• A transaction locks an object before using it
• When an object is locked by another transaction, the requesting transaction must wait
• Binary locks have two possible states:
• 1.locked (lock_item (X) operation) and
• 2.unlocked (unlock (X) operation
• Multiple-mode locks allow concurrent access to the same item by several transactions. Three possible states:
• 1.read locked or shared locked (other transactions are allowed to read the item)
• 2.write locked or exclusive locked (a single transaction exclusively holds the lock on the item) and
• 3.unlocked.
• Locks are held in a lock table.
• upgrade lock: read lock to write lock
• downgrade lock: write lock to read lock

• TYPES OF LOCKS
• SHARED LOCKS
• EXCLUSIVE LOCKS
Transaction uses locks to deny access to other transactions and so prevent incorrect updates.
• Most widely used approach to ensure serializability.
• Generally, a transaction must claim a shared (read) or exclusive (write) lock on a data item before read or write.
• Lock prevents another transaction from modifying item or even reading it, in the case of a write lock.
• Locking - Basic Rules
• If transaction has shared lock on item, can read but not update item.
• If transaction has exclusive lock on item, can both read and update item.
• Reads cannot conflict, so more than one transaction can hold shared locks simultaneously on same item.
• Exclusive lock gives transaction exclusive access to that item.
• Some systems allow transaction to upgrade read lock to an exclusive lock, or downgrade exclusive lock to a shared lock.
• Two-Phase Locking (2PL)
Strict 2PL:
– If T wants to read an object, first obtains an S lock.
– If T wants to modify an object, first obtains X lock.
– Hold all locks until end of transaction.
– Guarantees serializability, and recoverable schedule, too! also avoids WW problems!
2PL:
– Slight variant of strict 2PL
– transactions can release locks before the end (commit or abort)
? But after releasing any lock it can acquire no new locks – Guarantees serializability
A two-phase locking ( 2PL) scheme is a locking scheme in which a transaction cannot request a new lock after releasing a lock. Two phase locking therefore involves two phases:
– Growing Phase ( Locking Phase) - When locks are acquired and none released.
– Shrinking Phase ( Unlocking Phase) - When locks are released and none acquired.
The attraction of the two-phase algorithm derives from a theorem which proves that the two-phase locking algorithm always leads to serializable schedules. This is a sufficient condition for serializability although it is not necessary.
Strict two-phase locking ( Strict 2PL) is the most widely used locking protocol, and has following two rules:
– If a transaction wants to read (respectively, modify) an object, it first requests a shared (respectively, exclusive) lock on the object.
–All locks held by a transaction are released when the transaction is completed
In effect the locking protocol allows only ‘safe’ interleavings of transactions.
Q) Three transactions A, B and C arrive in the time sequence A, then B and then C. The transactions are run concurrently on the database. Can we predict what the result would be if 2PL is used?
– No, we cannot do that since we are not able to predict which serial schedule the 2PL schedule is going to be equivalent to. The 2PL schedule could be equivalent to any of the following six serial schedules: ABC, ACB, BAC, BCA, CAB, CBA.
• Two-Phase Locking (2PL)
Transaction follows 2PL protocol if all locking operations precede first unlock operation in the transaction.
• Two phases for transaction:
– Growing phase - acquires all locks but cannot release any locks.
– Shrinking phase - releases locks but cannot acquire any new locks.
• Preventing Lost Update Problem using 2PL Eg in slides
• Preventing Uncommitted Dependency Problem using 2PL
• Locking Granularity
A database item which can be locked could be
– a database record
– a field value of a database record
– the whole database
• Trade-offs
?coarse granularity - the larger the data item size, the lower the degree of concurrency
?fine granularity - the smaller the data item size, the more locks to be managed and stored, and the more lock/unlock operations needed.
Deadlock
An impasse that may result when two (or more) transactions are each waiting for locks held by the other to be released. Eg in slides
• Conditions For Deadlock
• Mutual Exclusion
• Hold And Wait
• Non Preemption
• Circular Wait
Recovery
• Occurs in case of transaction failures.
• Database (DB) is restored to the most recent consistent state just before the time of failure.
• To do this, the DB system needs information about changes applied by various transactions. It is the system log.
Contents of System Log:
• [start_transaction, T]: Indicates that transaction T has started execution.
• [write_item, T, X, old_value, new_value]: Indicates that transaction T has changed the value of DB item X from old_value to new_value.
• [read_item, T, X]: Indicates that transaction T has read the value of DB item X.
• [commit, T]: Indicates that transaction T has completed successfully, and affirms that its effect can be committed (recorded permenantly) to the database.
• [abort, T]: Indicates that transaction T has been aborted.
Deadlock
• Only one way to break deadlock: abort one or more of the transactions.
• Deadlock should be transparent to user, so DBMS should restart transaction(s).
• Three general techniques for handling deadlock:
– Timeouts.
– Deadlock prevention.
– Deadlock detection and recovery.
Timeouts
• Transaction that requests lock will only wait for a system-defined period of time.
• If lock has not been granted within this period, lock request times out.
• In this case, DBMS assumes transaction may be deadlocked, even though it may not be, and it aborts and automatically restarts the transaction.
Deadlock Prevention
• DBMS looks ahead to see if transaction would cause deadlock and never allows deadlock to occur.
• Could order transactions using transaction timestamps:
– Wait-Die - only an older transaction can wait for younger one, otherwise transaction is aborted (dies) and restarted with same timestamp.
– Wound-Wait - only a younger transaction can wait for an older one. If older transaction requests lock held by younger one, younger one is aborted (wounded).
Deadlock Detection and Recovery
• DBMS allows deadlock to occur but recognizes it and breaks it.
• Usually handled by construction of wait-for graph (WFG) showing transaction dependencies:
– Create a node for each transaction.
– Create edge Ti -> Tj, if Ti waiting to lock item locked by Tj.
• Deadlock exists if and only if WFG contains cycle.
• WFG is created at regular intervals.
• Recovery Outline
• Restore to most recent “consistent” state just before time of failure
– Use data in the log file
• Catastrophic Failure
– Restore database from backup
– Replay transactions from log file
• Database becomes inconsistent (non-catastrophic errors)
– Undo or Redo last transactions until consistent state is restored
Recovery Algorithms for Non-catastrophic Errors:
Deferred Update (NO-UNDO/REDO):
– Data written to buffers
– Not physically updated until after commit point reached and logs have been updated
– No undo is even necessary
– Redo might be necessary on transactions that have been logged but not physically updated
– Known as the NO-UNDO/REDO algorithm
Immediate Update (UNDO/REDO):
– Database being updated as transaction occurs
– However, log always force written first
– Partially completed transactions will have to be undone
– Committed transactions might have to be redone
– Known as the UNDO/REDO algorithm
– Variation on the scheme:
• Data is physically updated before commit
• Only requires UNDO
• Known as the UNDO/NO-REDO algorithm
• Logging
• Record REDO and UNDO information, for every update, in a log.
– Sequential writes to log (put it on a separate disk).
– Minimal info (diff) written to log, so multiple updates fit in a single log page.
Log: An ordered list of REDO/ UNDO actions
– Log record contains:
– and additional control info
Caching of Disk Blocks:
– Disk blocks typically cached to main memory
– Changes made to cache block which is then written back at some later time
– Many DBMS’s even handle the low-level I/O
DBMS Caching:
– All database accesses check to see if required item is in the cache first. If not, item is loaded into cache
– Dirty bit: Determines if cache block has been updated and needs to be written back to disk
– Pin/Unpin bit: Is it OK to write block back to disk yet?
– In-place updating: Block is written back out to same location. Overwrite original
– Shadowing: Block is written to new location
? Old copy is kept
? Before Image (BFIM) & After Image (AFIM)
• The Write- Ahead Logging Protocol:
– Must force the log record for an update before the corresponding data page gets to disk.
– Must write all log records for a transaction before commit
• The rule that all transactions follow in the WAL protocol is "Write the log before you write the data”. When a transaction wants to update a record, it pins the page containing the record in the main-memory buffer pool, modifies the page in memory, generates an undo/redo record, forces the undo/redo record to the log, and unpins the page in the buffer pool. At some later time, the page replacement algorithm or a checkpoint will write the page back to the database.
• WAL protocol:
• #1 guarantees Atomicity.
• #2 guarantees Durability.
• Each log record has a unique Log Sequence Number (LSN). LSNs always increasing.
• Each data page contains a pageLSN. The LSN of the most recent log record for an update to that page.
• System keeps track of flushedLSN. The max LSN flushed so far.
• WAL: Before a page is written, pageLSN <= flushedLSN
• Normal Execution of a transaction
• Series of reads & writes, followed by commit or abort. We will assume that write is atomic on disk.
• In practice, additional details to deal with non- atomic writes.
• Strict 2PL. STEAL, NO- FORCE buffer management, with Write- Ahead Logging.
Checkpoints in the System Log
• Checkpoint record written in log when all updated DB buffers written out to disk
• Any committed transaction occurring before checkpoint in log can be considered permanent (won’t have to be redone after crash)
• Actions
– suspend execution of all transactions
– force-write all modified buffers to disk
– write checkpoint entry in log and force write log
– resume transactions
• Checkpointing: Periodically, the DBMS creates a checkpoint, in order to minimize the time taken to recover in the event of a system crash. It quiesces the system (makes all currently executing transactions pause), writes all dirty buffers to disk, and then allows transactions to resume normal processing.
• The problem with this simple checkpoint is that it makes the data unavailable for too long, possibly several minutes, while the checkpoint is being done. There is another technique called Fuzzy checkpoint, which does not have this problem, because it does not quiesce the system. Instead, for each buffer, the fuzzy checkpoint procedure latches the buffer (gets an exclusive semaphore on it), writes it to disk if it is dirty, and then unlatches the buffer. In addition, fuzzy checkpoint writes the ID’s of the currently active transactions to the log. Fuzzy checkpoint just locks buffers one at a time and releases them.
• How does the system recovers from a crash: The crash recovery algorithm reads the most recent checkpoint information from the log, which yields a set of transaction ID’s that were active at the time of the checkpoint. Then it scans the log forward from the checkpoint, reapplying every undo/redo record to the database (this is called REDO ALL). During the forward pass, it analyzes the log to determine which transactions did not commit or abort before the crash. These transactions are called the "losers." Then, the recovery algorithm scans the log in reverse, undoing log records for all the losers (this is called UNDO LOSERS).
• Additional Crash Issues: What happens if system crashes during Analysis? During REDO? How do you limit the amount of work in REDO ?
– Flush asynchronously in the background.
– Watch “hot spots”!
• How do you limit the amount of work in UNDO ?
– Avoid long- running transactions.





Responses

Author: sri phani kumari    19 May 2008Member Level: Gold   Points : 2
good article


Feedbacks      
Popular Tags   What are tags ?   Search Tags  
(No tags found.)

Post Feedback


This is a strictly moderated forum. Only approved messages will appear in the site. Please use 'Spell Check' in Google toolbar before you submit.
You must Sign In to post a response.
Next Resource: Data Mining
Previous Resource: Mobile Ad Hoc Networks
Return to Discussion Resource Index
Post New Resource
Category: Computer & Technology


Post resources and earn money!
 
Related Resources


Contact Us    Privacy Policy    Terms Of Use   

SpiderWorks Technologies Pvt Ltd. 2006 - 2007 All Rights Reserved.