Blog highlights basic steps to upgrade oracle 10g RAC db from
10.2.0.4 to 10.2.0.5 along with obstacles faced & Solution.
Systems:
Racdb1/Racdb2
~ Sun OS 5.10 ~ Oracle 10204 ~ ASM ~ Primary
Racdbdr1/Racdbdr2
~ Sun OS 5.10 ~ Oracle 10204 ~ ASM ~ Standby
Activity: To Apply 10.2.0.5 Patch (8202632)
+ CRS PSU Jan 2011 (9952245) + RDBMS PSU JUL 2011 (12419392) to existing
10.2.0.4 oracle & cluster binaries
Activity Sequence:
Upgrading
10g RAC from 10204 to 10205 involves below basic steps which needs to be
performed on both primary & DR setups sequentially
1. Per-activity checks
ORACLE_HOME/ORA_CRS_HOME/oraInventory/OCR backup
Invalid Object/Index, backup status, datafile file
status
Opatch version verification, current patchset
2. Differing archive sync
Racdb1/Racdb2 primary archive deferred with standby
Racdbdr1/Racdbdr2
3. Shut all services
Shut down all
oracle / CRS services on all nodes & verify
4. Apply 10.2.0.5 Patch Set on ORA_CRS_HOME
Apply
10.2.0.5 patch set on ORA_CRS_HOME
5. Perform post patch steps for CRS
Executing $ORA_CRS_HOME/install/root102.sh
on both primary nodes
This script
performs lib file patching & starts up all cluster services
6. Apply 10.2.0.5 Patch Set on ORACLE_HOME
Stop all
services started post execution of root102.sh
Apply
10.2.0.5 patch set on ORACLE_HOME
7. Perform post patch steps for CRS
Execute $ORACLE_HOME/root.sh
on both primary nodes
Set
cluster_database to FALSE in no-mount stage, startup only one instance
@catupgrd.sql
@utlrp.sql
8. Verify 10.2.0.5 patch
Verify
component status from dba_registry or registry$
Verify
Invalid objects
9. Backup ORACLE_HOME/ORA_CRS_HOME/oraInventory/OCR
10. Apply CRS PSU to ORA_CRS_HOME
11. Apply CRS PSU to ORACLE_HOME
12. Perform post PSU
@catbundle
PSU apply
Verify
registry$history
Verify Invalid
objects
All above
steps executed successfully on Racdb1/Racdb2 but multiple issues faced on Racdbdr1/Racdbdr2 as below
My Oracle Support SR’s (raised during crisis)
3-4716340091:
Applying PSU in RAC 10g
3-4697480261:
Applying 10.2.0.5 patch to RAC
Notes:
How to
Downgrade/Remove Oracle Clusterware (CRS) Patchset Software (Doc ID 754095.1)
INIT.CSSD
REMAINS IN STARTCHECK DURING STARTUP (Doc ID 757383.1)
Problems Encountered/Solution:
Problem 1:
Observed
below errors while patching crs binaries with 10.2.0.5 patch set
tar: can't set time
on ./Queries21/generalQueries/10.1.0.2.0: Not owner
tar: can't set time
on ./Queries21/generalQueries: Not owner
tar: can't set time
on ./Queries21/generalQueries/10.1.0.3.0: Not owner
tar: can't set time
on ./Queries21/rgsQueries/10.1.0.2.0: Not owner
tar: can't set time
on ./Queries21/rgsQueries: Not owner
tar: can't set time
on ./Queries21/rgsQueries/10.1.0.3.0: Not owner
tar: can't set time
on ./Queries21/ClusterQueries/10.2.0.1.0: Not owner
tar: can't set time
on ./Queries21/ClusterQueries: Not owner
tar: can't set time
on ./Queries21/globalVarQueries/2.1.0.4.1: Not owner
.
.
On Activity Sequence: 4 (Apply 10.2.0.5
Patch Set on ORA_CRS_HOME)
Problem description:
Errors
occurred while applying 10.2.0.5 to ORA_CRS_HOME at the remote binary copy
stage
Action Performed:
Re-executed ./runInstaller to apply patch on ORA_CRS_HOME
Problem Faced:
Binaries got
over written, various file/folder permission’s got changed ,few utility scripts
got modified i.e. rootconfig, crsctl etc.
Oracle Support(MOS) Solution:
Error can be
ignored, without re-applying patch
Permissions
for all impacted files should have been verified to match with primary node
binaries.
If not then
same should have been changed
Lessons:
1. Never
re-run the runinstaller on installation failure
2. Never
select ignore & move ahead in
runinstaller
On Any error
If option prompted for ignore/re-try/cancel, cancel should be selected incase
re-try is failing to accept the changes made
Selecting
cancel installation will roll back the changes made without leaving the new
binaries in existing ORACLE/CRS_HOME
Problem 2:
Errors while executing root102.sh
root@Racdbdr2 #
/oracle/clusterware/crs/oracle/product/10.2.0/install/root102.sh
Creating pre-patch directory for saving pre-patch clusterware files
Completed patching clusterware files to /oracle/clusterware/crs/oracle/product/10.2.0
Relinking some shared libraries.
ar: writing /oracle/clusterware/crs/oracle/product/10.2.0/lib/libn10.a
ar: writing /oracle/clusterware/crs/oracle/product/10.2.0/lib32/libn10.a
ar: writing /oracle/clusterware/crs/oracle/product/10.2.0/lib/libn10.a
Relinking of patched files is complete.
WARNING: directory '/oracle' is not owned by root
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
Starting up the CRS daemons.
Waiting for the patched CRS daemons to start.
This may take a while on some systems..
Creating pre-patch directory for saving pre-patch clusterware files
Completed patching clusterware files to /oracle/clusterware/crs/oracle/product/10.2.0
Relinking some shared libraries.
ar: writing /oracle/clusterware/crs/oracle/product/10.2.0/lib/libn10.a
ar: writing /oracle/clusterware/crs/oracle/product/10.2.0/lib32/libn10.a
ar: writing /oracle/clusterware/crs/oracle/product/10.2.0/lib/libn10.a
Relinking of patched files is complete.
WARNING: directory '/oracle' is not owned by root
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
Starting up the CRS daemons.
Waiting for the patched CRS daemons to start.
This may take a while on some systems..
.
Timed out waiting for the CRS daemons to start. Look at the
system message file and the CRS log files for diagnostics.
system message file and the CRS log files for diagnostics.
On Activity Sequence: 5 (Perform post
patch steps for CRS)
System: Racdbdr1
Problem description:
Unable to
perform roo102.sh post patching activity on CRS
Action Performed:
10.2.0.4
ORA_CRS_HOME restored.
Problem Faced:
Unable to
start CRS/DB post restoration
Solution:
N/w Socket
file deletion, permission changes, OCR restore & crs restart.
(Solution in
detail provided in sub-sequent section)
Lessons:
1.
Oracle CRS binaries should not be restored using
root osuser
2.
Permissions should be verified post restoration
3.
OCR to be restored along with ORA_CRS_HOME
binaries
Problem 3:
Unable to
start cluster/database post 10204
ORA_CRS_HOME restoration.
Running
services were not getting terminated post crs stop
root@Racdbdr1 # ps -ef | grep init
root 1 0 0 10:09:55 ? 0:01 /sbin/init
root 27379 2607 0 10:53:41 ? 0:00 /bin/sh /etc/init.d/init.cssd oclsomon
root 20430 1 0 10:43:13 ? 0:00 /bin/sh /etc/init.d/init.crsd run
root 2900 2607 0 10:13:07 ? 0:00 /bin/sh /etc/init.d/init.cssd oprocd
root 2607 1 0 10:13:05 ? 0:11 /bin/sh /etc/init.d/init.cssd fatal
root 3034 1 0 11:03:28 ? 0:00 /bin/sh /etc/init.d/init.evmd run
root 2955 2607 0 10:13:08 ? 0:00 /bin/sh /etc/init.d/init.cssd daemon
root 5119 2036 0 11:06:53 pts/1 0:00 grep init
root@Racdbdr1 # ps -ef | grep d.bin
root 20526 20430 0 10:43:14 ? 0:05 /oracle/clusterware/crs/oracle/product/10.2.0/bin/crsd.bin restart
oracle 3154 3153 0 11:03:30 ? 0:02 /oracle/clusterware/crs/oracle/product/10.2.0/bin/evmd.bin
oracle 3118 2955 0 10:13:08 ? 0:08 /oracle/clusterware/crs/oracle/product/10.2.0/bin/ocssd.bin
root 3051 2900 0 10:13:08 ? 0:00 /oracle/clusterware/crs/oracle/product/10.2.0/bin/oprocd.bin run -t 1000 -m 500
root 1 0 0 10:09:55 ? 0:01 /sbin/init
root 27379 2607 0 10:53:41 ? 0:00 /bin/sh /etc/init.d/init.cssd oclsomon
root 20430 1 0 10:43:13 ? 0:00 /bin/sh /etc/init.d/init.crsd run
root 2900 2607 0 10:13:07 ? 0:00 /bin/sh /etc/init.d/init.cssd oprocd
root 2607 1 0 10:13:05 ? 0:11 /bin/sh /etc/init.d/init.cssd fatal
root 3034 1 0 11:03:28 ? 0:00 /bin/sh /etc/init.d/init.evmd run
root 2955 2607 0 10:13:08 ? 0:00 /bin/sh /etc/init.d/init.cssd daemon
root 5119 2036 0 11:06:53 pts/1 0:00 grep init
root@Racdbdr1 # ps -ef | grep d.bin
root 20526 20430 0 10:43:14 ? 0:05 /oracle/clusterware/crs/oracle/product/10.2.0/bin/crsd.bin restart
oracle 3154 3153 0 11:03:30 ? 0:02 /oracle/clusterware/crs/oracle/product/10.2.0/bin/evmd.bin
oracle 3118 2955 0 10:13:08 ? 0:08 /oracle/clusterware/crs/oracle/product/10.2.0/bin/ocssd.bin
root 3051 2900 0 10:13:08 ? 0:00 /oracle/clusterware/crs/oracle/product/10.2.0/bin/oprocd.bin run -t 1000 -m 500
ocssd.log:
---------
[ CSSD]2011-10-16 11:34:13.634 [14] >TRACE: clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2011-10-16 11:34:13.634 [14] >ERROR: clssgmclientlsnr: listening failed for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1)) (3)
[ CSSD]2011-10-16 11:34:13.634 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2011-10-16 11:34:13.634 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_Racdbdr1_crs))
---------
[ CSSD]2011-10-16 11:34:13.634 [14] >TRACE: clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2011-10-16 11:34:13.634 [14] >ERROR: clssgmclientlsnr: listening failed for (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1)) (3)
[ CSSD]2011-10-16 11:34:13.634 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2011-10-16 11:34:13.634 [14] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_Racdbdr1_crs))
System: Racdbdr1
Problem description:
Post problem
1 old (10.2.0.4) ORA_CR_HOME has been restored to restart the installation.
Post CRS
home restoration CRS/CSS/EVM services were not starting
Action Performed:
How to
Downgrade/Remove Oracle Clusterware (CRS) Patchset Software (Doc ID 754095.1)
1.
Crs disable “crsctl stop crs”
2.
Server rebooted
3.
Cluster de-install & rebuild
Remove the
socket files in /tmp/.oracle or /var/tmp/.oracle , reboot the OS
Stopped
clusterware on both nodes
$ORA_CRS_HOME/install/rootdelete.sh
on both nodes
$ORA_CRS_HOME/install/rootdeinstall
on the first node
$ORA_CRS_HOME/root.sh
on the node1 and then on the node2
4.
OCR restored from pre-activity backup
ocrconfig
–showbackup
octconfig
–restore <ocr_file_name>
Tried to start Cluster on node 1
$ORA_CRS_HOME/root.sh
WARNING: directory '/oracle' is not owned by root
"/dev/rdsk/c2t40d2s5" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
"/dev/rdsk/c2t40d3s5" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
"/dev/rdsk/c2t40d4s5" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
WARNING: directory '/oracle' is not owned by root
"/dev/rdsk/c2t40d2s5" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
"/dev/rdsk/c2t40d3s5" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
"/dev/rdsk/c2t40d4s5" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
Action Performed:
Verified $CRS_HOME/install/paramfile.crs &
updated $CRS_HOME/install/rootconfig accordingly.
ORA_CRS_HOME=/oracle/clusterware/crs/oracle/product/10.2.0
CRS_ORACLE_OWNER=oracle
CRS_DBA_GROUP=oinstall
CRS_VNDR_CLUSTER=false
CRS_OCR_LOCATIONS=/dev/rdsk/c2t40d0s5,/dev/rdsk/c2t40d1s5
CRS_CLUSTER_NAME=crs
CRS_HOST_NAME_LIST=Racdbdr1,1,Racdbdr2,2
CRS_NODE_NAME_LIST=Racdbdr1,1,Racdbdr2,2
CRS_PRIVATE_NAME_LIST=Racdbdr1-priv,1,Racdbdr2-priv,2
CRS_LANGUAGE_ID='AMERICAN_AMERICA.WE8ISO8859P1'
CRS_VOTING_DISKS=/dev/rdsk/c2t40d2s5,/dev/rdsk/c2t40d3s5,/dev/rdsk/c2t40d4s5
CRS_NODELIST=Racdbdr1,Racdbdr2
CRS_NODEVIPS='Racdbdr1/Racdbdr1-vip/255.255.255.192/ce0,Racdbdr2/Racdbdr2-vip/255.255.255.192/ce0'
Corrected to
root@Racdbdr1 #
cat /oracle/clusterware/crs/oracle/product/10.2.0/bin/install/paramfile.crs
ORA_CRS_HOME=/oracle/clusterware/crs/oracle/product/10.2.0
CRS_ORACLE_OWNER=root
CRS_DBA_GROUP=oinstall
CRS_VNDR_CLUSTER=false
CRS_OCR_LOCATIONS=/dev/rdsk/ocr15,/dev/rdsk/ocr25
CRS_CLUSTER_NAME=crs
CRS_HOST_NAME_LIST=Racdbdr1,1,Racdbdr2,2
CRS_NODE_NAME_LIST=Racdbdr1,1,Racdbdr2,2
CRS_PRIVATE_NAME_LIST=Racdbdr1-priv,1,Racdbdr2-priv,2
CRS_LANGUAGE_ID='AMERICAN_AMERICA.WE8ISO8859P1'
CRS_VOTING_DISKS=/dev/rdsk/ocr35,/dev/rdsk/ocr45,/dev/rdsk/ocr55
CRS_NODELIST=Racdbdr1,Racdbdr2
CRS_NODEVIPS='Racdbdr1/Racdbdr1-vip,Racdbdr2/Racdbdr2-vip'
O/p:
----------------rootdelete.sh--------------------
root@Racdbdr1 #
$ORA_CRS_HOME/install/rootdelete.sh
Shutting down
Oracle Cluster Ready Services (CRS):
Stopping
resources. This could take several minutes.
Error while
stopping resources. Possible cause: CRSD is down.
Shutdown has
begun. The daemons should exit soon.
Checking to see
if Oracle CRS stack is down...
Oracle CRS stack
is not running.
Oracle CRS
stack is down now.
Removing script
for Oracle Cluster Ready services
Updating ocr
file for downgrade
Cleaning up SCR
settings in '/var/opt/oracle/scls_scr'
root@Racdbdr1 #
-----------------------------rootdeinstall.sh----------------------------------
root@Racdbdr1 #
rootdeinstall.sh
Removing
contents from OCR mirror device
2560+0 records
in
2560+0 records
out
Removing
contents from OCR device
2560+0 records
in
2560+0 records
out
-------------------root.sh----------------------------------
root@Racdbdr1 #
$ORA_CRS_HOME/root.sh
WARNING:
directory '/oracle' is not owned by root
Checking to see
if Oracle CRS stack is already configured
Setting the
permissions on OCR backup directory
Setting up NS
directories
Oracle Cluster
Registry configuration upgraded successfully
WARNING:
directory '/oracle' is not owned by root
Successfully
accumulated necessary OCR keys.
Using ports:
CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>:
<nodename> <private interconnect name> <hostname>
node 1: Racdbdr1
Racdbdr1-priv Racdbdr1
node 2: Racdbdr2
Racdbdr2-priv Racdbdr2
Creating OCR
keys for user 'root', privgrp 'root'..
Operation
successful.
Now formatting
voting device: /dev/rdsk/ocr35
Now formatting
voting device: /dev/rdsk/ocr45
Now formatting
voting device: /dev/rdsk/ocr55
Format of 3
voting devices complete.
Startup will be
queued to init within 30 seconds.
Adding daemons
to inittab
Expecting the
CRS daemons to be up within 600 seconds.
CSS is active
on these nodes.
Racdbdr1
CSS is inactive
on these nodes.
Racdbdr2
Local node
checking complete.
Run root.sh on
remaining nodes to start CRS daemons.
root@Racdbdr1 #
Problem 4:
Unable to
start oracle/RDBMS services as an oracle user
oracle@Racdbdr1$crs_stat -t
Name
Type Target State
Host
------------------------------------------------------------
ora....dr1.gsd application
ONLINE ONLINE Racdbdr1
ora....dr1.ons application
ONLINE ONLINE Racdbdr1
ora....dr1.vip application
ONLINE ONLINE Racdbdr1
ora....SM2.asm application
ONLINE UNKNOWN Racdbdr2
ora....dr2.gsd application
ONLINE ONLINE Racdbdr2
ora....dr2.ons application
ONLINE ONLINE Racdbdr2
ora....dr2.vip application
ONLINE ONLINE Racdbdr2
oracle@Racdbdr1$srvctl start
asm -n Racdbdr1 -i +ASM1 -o mount
PRKS-1009 : Failed to start ASM instance "+ASM1" on
node "Racdbdr1", [CRS-1028: Dependency analysis failed because of:
CRS-0223: Resource 'ora.Racdbdr1.ASM1.asm' has placement
error.]
System: Racdbdr1
Problem description:
After
rebuilding the cluster ,we were unable to add new services like db, tns,
instance as a oracle osuser to cluster services
Action Performed:
Srvctl
remove & srvctl add service commands executed to remove already added
services to cluster as a root.
root@Racdbdr1 # crs_stat -t
Name
Type Target
State Host
------------------------------------------------------------
ora.DRNMS.db
application OFFLINE OFFLINE
ora....s1.inst application ONLINE
UNKNOWN Racdbdr1
ora....s2.inst application ONLINE
OFFLINE
ora....SM1.asm application ONLINE
ONLINE Racdbdr1
ora....R1.lsnr application ONLINE
ONLINE Racdbdr1
ora....dr1.gsd application ONLINE
ONLINE Racdbdr1
ora....dr1.ons application ONLINE
ONLINE Racdbdr1
ora....dr1.vip application ONLINE
ONLINE Racdbdr1
ora....SM2.asm application ONLINE
ONLINE Racdbdr2
ora....R2.lsnr application ONLINE
ONLINE Racdbdr2
ora....dr2.gsd application ONLINE
ONLINE Racdbdr2
ora....dr2.ons application ONLINE
ONLINE Racdbdr2
ora....dr2.vip application ONLINE
ONLINE Racdbdr2
Problem 5:
Unable to
execute root102.sh post 10205 CRS Patching
root@Racdbdr1 # /oracle/clusterware/crs/oracle/product/10.2.0/install/root102.sh
WARNING: directory '/oracle' is not owned by root
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
Starting up the CRS daemons.
Waiting for the patched CRS daemons to start.
This may take a while
on some systems.
.
.
.
.
Timed out waiting for the CRS daemons to start. Look at the
system message file and the CRS log files for diagnostics.
system message file and the CRS log files for diagnostics.
System: Racdbdr1
Problem description:
While
running post script post applying 10205 CRS patch, we were unable to start the
cluster
Action Performed:
1. Cluster stopped.
/crsctl stop crs –f , removed /var/tmp/.oracle start the cluster on both nodes
2. Permission
changed to oracle:oinstall from root:oinstall for below directories.
chown -R oracle:oinstall oui install ccr
inventory odbc log diagnostics OPatch jre JRE
root@Racdbdr1 #
root102.sh
WARNING:
directory '/oracle' is not owned by root
Preparing to
recopy patched init and RC scripts.
Recopying init
and RC scripts.
Startup will be
queued to init within 30 seconds.
Starting up the
CRS daemons.
Waiting for the
patched CRS daemons to start.
This may take a while on some systems.
.
10205 patch
successfully applied.
clscfg:
EXISTING configuration version 3 detected.
clscfg: version
3 is 10G Release 2.
Successfully
deleted 1 values from OCR.
Successfully
deleted 1 keys from OCR.
Successfully
accumulated necessary OCR keys.
Using ports:
CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node
<nodenumber>: <nodename> <private interconnect name>
<hostname>
node 1: Racdbdr1
Racdbdr1-priv Racdbdr1
Creating OCR
keys for user 'root', privgrp 'root'..
Operation
successful.
clscfg -upgrade
completed successfully
Creating
'/oracle/clusterware/crs/oracle/product/10.2.0/install/paramfile.crs' with data
used for CRS configuration
Setting CRS
configuration values in
/oracle/clusterware/crs/oracle/product/10.2.0/install/paramfile.crs
good doc ajay....
ReplyDelete