How Oracle database does instance recovery after failures?

INSTANCE RECOVERY – Oracle database have inherit feature of recovering from instance failures automatically. SMON is the background process which plays a key role in making this possible. Though this is an automatic process that runs after the instance faces a failure, it is very important for every DBA to understand how is it made possible. In this blog, we are going to understand the same with the help of a flow diagram and a demonstration.

Instance recovery is the way of bringing back the transactions which are not yet written to the disk by DBWR at the moment of instance failure. Instance failures are possible in multiple ways: hardware issues, power glitches, network outages, abnormal instance shutdowns (SHUTDOWN ABORT). Database startup after any of these failures will automatically trigger the instance recovery. Let us now get a clear understanding of this process with the help of the flow diagram and with hands-on activity as well. This exercise would similar to the one we did in “Why should we configure limits.conf for Oracle database?” blog.

Recovery concepts:

27_2

Consider the image above, let us get into details of this flow.

  1. At 9:00 AM consider your database is working normally.
  2. At 9:10 AM think of database running 30 transactions in which 20 are committed and 10 are uncommitted, they are yet to be decided by user process to commit or rollback.
  3. While this transaction is still running, let us consider that instance is terminated at 9:11 AM.
  4. Now as we bring up the instance, a perfect recovery is the one that makes sure 20 transactions are not lost and 10 uncommitted transactions are lost. Oracle does it perfectly. Hoping so, startup up the instance normally.
  5. Once database mounts, then the initial phases of recovery begin.
  6. With the help of CKPT background process, instance recognizes any loss of transactions with the compatibility checks of SCN numbers between control files and data files.
  7. All the transactions which are lost without updated to the datafiles are re-ran with the help of REDO Which means that all 30 transactions are ran as on 9:10 AM now. This action is called as ROLL FORWARD.
  8. Database then checks the accessibility of UNDO segments and opens the database.
  9. On opening the database, now the 10 transactions which are not committed are rolled back making the database consistent with 20 committed transactions with the help of UNDO This action is called as ROLL BACK.
  10. Now that instance recovery is completed, Database is not recovered with 20 committed transactions. But yes, blocks would be updated into datafiles asynchronously by DBWR as usual.

We will now see the demonstration of the same to relate it with the reality.

  • Oracle database version used: 12.1.0.2
  • Host machine: OEL
  • Database: NON-CDB
  1. Database is working normally now.


SQL> select instance_name from v$instance;

INSTANCE_NAME

—————-

ORADB

Transactions are always running the database in the SYSTEM and SYSAUX tablesapces. We do not need to run any as of now.

  1. Let us terminate the instance now.

[oracle@OEL-12c trace]$ ps -ef | grep smon

oracle    4029     1  0 Oct21 ?        00:00:03 ora_smon_ORADB

oracle   26131  1172  0 16:32 pts/0    00:00:00 grep –color=auto smon

[oracle@OEL-12c trace]$ kill -9 4029

  1. Let us now startup the database and monitor the alert log file. Below is the important output of the alert log file.

 

Started redo scan

Sun Oct 22 16:34:11 2017

Completed redo scan

read 2450 KB redo, 553 data blocks need recovery

Sun Oct 22 16:34:11 2017

Started redo application at

Thread 1: logseq 11, block 15046

Sun Oct 22 16:34:11 2017

Recovery of Online Redo Log: Thread 1 Group 2 Seq 11 Reading mem 0

Mem# 0: /oradata/ORADB/onlinelog/o1_mf_2_drl8n920_.log

Mem# 1: /oradata/fra/ORADB/onlinelog/group2_member2.log

Sun Oct 22 16:34:11 2017

Completed redo application of 1.92MB

Sun Oct 22 16:34:11 2017

Completed crash recovery at

Thread 1: logseq 11, block 19947, scn 3934834

Alert log clearly says that REDO has been applied with the transactions which are not committed when instance terminated.

  1. As soon as the instance is opened, roll back operation is performed and below is the output of the same.

SQL> SELECT r.NAME “RB Segment Name”, dba_seg.size_mb,

DECODE(TRUNC(SYSDATE – LOGON_TIME), 0, NULL, TRUNC(SYSDATE – LOGON_TIME) || ‘ Days’ || ‘ + ‘) ||

TO_CHAR(TO_DATE(TRUNC(MOD(SYSDATE-LOGON_TIME,1) * 86400), ‘SSSSS’), ‘HH24:MI:SS’) LOGON,

v$session.SID, v$session.SERIAL#, p.SOSID,

v$session.USERNAME, v$session.STATUS

FROM v$lock l, v$process p, v$rollname r, v$session,

(SELECT segment_name, ROUND(bytes/(1024*1024),2) size_mb FROM dba_segments

WHERE segment_type = ‘TYPE2 UNDO’ ORDER BY bytes DESC) dba_seg

WHERE l.SID = p.pid(+) AND

v$session.SID = l.SID AND

TRUNC (l.id1(+)/65536)=r.usn AND

l.TYPE(+) = ‘TX’ AND

l.lmode(+) = 6

AND r.NAME = dba_seg.segment_name

ORDER BY size_mb DESC;

RB Segment Name                   SIZE_MB

—————————— ———-

LOGON                                                           SID    SERIAL#

——————————————————– ———- ———-

SOSID                    USERNAME                       STATUS

———————— —————————— ——–

_SYSSMU3_3285411314$                 2.13

00:00:09                                                         19      41123

26475                                                   ACTIVE

 

I love to see your comments!

Feel free to give additional information in the comments, that I can validate and include in blog posts or update this as well.

Related Posts

About The Author

2 Comments

  1. sriramchowdary009@gmail.com'
    Sriram bandarupalli
    October 25, 2017
    • Pawan Kumar Yaddanapudi
      October 27, 2017

Add Comment