We decide to upgrade our existing two node Oracle-RAC installation to the latest SLES10 SP2 version. At this time we had a stable SLES10 64-Bit productive environment. The first problem we had was to register both nodes within the novell customer care center in Yast2. But this is a another issue and i will post the solution in a separate article.
Starting point was a SLES10 SP1 64-Bit installation. First we had to install all available patches and software updates. Then we did an fast and easy upgrade to the SP2 with the following script:
#!/bin/bash set -x rug in -y -t patch slesp1-libzypp # For SLED use sledp1-libzypp sleep 40 && rug ping -a rug in -y -t patch move-to-sles10-sp2 # For SLED use move-to-sled10-sp2 rug refresh && rug ping -a rug up -y -t patch sleep 240 && rug ping -a rug up -y --agree-to-third-party-licences -t patch <<EOF n # For an automatic reboot after the migration, you may change 'n' to 'y' EOF
The whole upgrade process was really smooth, but after booting the new kernel the CRS daemon of the Oracle cluster did not start. The first problem was found very quick, the SLES 10 upgrade overwrites without any note the file /etc/udev/rules.d/50-udev-default.rules. This contains a very important entry if you are running Oracle-RAC with ASM on raw devices. So we modified the config file and replace the line
KERNEL=="raw[0-9]*", SUBSYSTEM=="raw";, NAME="raw/%k", GROUP="disk"
with the following one
KERNEL=="raw*", SUBSYSTEM=="raw", NAME="raw/%k", OWNER="oracle", GROUP="dba", MODE="660"
After restarting the system all raw partitions were accessible by the user oracle but the CRS daemon still did not come up and the process list returns:
root 6929 1 0 13:56 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal root 6960 6928 0 13:56 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck root 6963 6929 0 13:56 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck root 7064 6935 0 13:56 ? 00:00:00 /bin/sh /etc/init.d/init.cssd startcheck
Ok, i thought “time to take a look into the logs”
bash$# tail /var/log/messages Nov 20 09:20:37 orac1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.4822. Nov 20 09:20:37 orac1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.4746. Nov 20 09:20:37 orac1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.4871.
This sounds very encouraging but a closer look the these files were frustrating, because they all had no content (zero-bytes). My next idea was to check the error message when i manually start the CRS daemon as root with the command:
1 2 3 4 5 6 7 8 9 10 11 12 13 | bash$# sh -x /etc/init.d/init.cssd startcheck + /etc/init.d/init.cssd runcheck + STATUS=0 + '[' 0 '!=' 0 ']' + '[' -f /opt/oracle/10.2/crs/lib/INVALID_DIRECTORY ']' + '[' '!' -r /opt/oracle/10.2/crs/bin/crsctl ']' + '[' '' = CSS ']' + /bin/su -l oracle -c '/opt/oracle/10.2/crs/bin/crsctl check boot > /tmp/crsctl.18545' /bin/sh: /opt/oracle/10.2/crs/bin/crsctl: Keine Berechtigung + RC=126 + '[' 126 '!=' 0 ']' + /bin/logger -puser.err Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.18545. + /bin/sleep 60 |
Line #9 gives a important hint, so i checked the permissions of the involved file
bash$# ls -la /opt/oracle/10.2/crs/bin/crsctl -r-xr-x--x 1 root dba 2002 2007-08-08 15:37 /opt/oracle/10.2/crs/bin/crsctl
This looks good and after some hours of wasting time i decide to check the group membership of the oracle user:
### after SLES10 SP2 installation ### bash$# id oracle uid=102(oracle) gid=103(oinstall) groups=103(oinstall) ### before SLES10 SP2 installation ### uid=102(oracle) gid=104(dba) Gruppen=104(dba),6(disk),103(oinstall)
I did not believe my eyes but the upgrade procedure of SLES10 SP2 removes the primary group dba of the oracle user. Therefore the user is not allowed the start the CRS daemon. The solution of this problem was quite simple. Just execute the following command as super-user and reboot the machine. The CRS daemon should start automatically and all databases should come up without any problems:
bash$# usermod -g dba -G oinstall,disk oracle
Conclusion:
The upgrade procedure to SLES10 SP2 was really straight forward but there a some pitfalls with are not documented by Novell and which are very hard to find out. I give you the advice to do a SLES10 SP2 upgrade first on a non-productive Oracle-RAC environment.
We have to use SLES10 because we don’t want to loose the Oracle support and the support of a 3rd party application which is certified for Oracle on SLES10. But next time i will propose my boss to buy RedHat Enterprise Linux or use a OpenSource Enterprise Linux distribution like CentOS. The package management of Novell/Suse is very frustrating if u a confirmed Ubuntu user. Please feel free to comment this post or contact me if you have any questions.
References:
- Oracle Meta-Link NOTE:329450.1
- Novell Support: How to update to SLES/SLED 10 SP2