Feb 04

Basically it is a good idea to setup the iLOM network outside the public network. Up to this date iLOM has no feature to setup a firewall to prevent unauthorized login attempts from outside the world. In my case i put all my iLOMs in a seperate private network 10.5.1.x / 255.255.255.0. Usually iLOMs network factory settings are set to dhcp, but i decide to give the SP a fix IP address. Thus i have to reconfigure the iLOM network settings through the serial console. Below i will explain how to setup the network of the iLOM SP with a serial console connection.

First you have to establish a serial connection to the SER MGT connector (use the ethernet-to-serial connector which is shipped with the SUN machine). Be sure that all setting in minicom are correct (8N1, 9600 baud, Disable Hardware Flow Control, CTS / RTS) and login to iLOM with your root account and your password (the factory password for the iLOM SP is changeme). The following code sequence let you change the network settings to a fix IP address in a private network (in my example i will use 10.5.1.x / 24).

cd /SP/console
start
cd /SP/network
set pendingipaddress=10.5.1.111
set pendingipnetmask=255.255.255.0
set pendingipgateway=10.5.1.254
set commitpending=true

Thats all. Now you can use the assigned IP address to login on the iLOMs WebGUI or directly connect to iLOM via SSH.

References:

written by phi.mic \\ tags: ,

Dez 16

To open a case at the SUN`s customer care center you need at least the System Name and the System Serial Number. But somtimes you don`t have access to the server room (e.g. alarm system). In such a case it is very easy to get the SN with the ipmi command line tool. Just enter the following command at your workstation and replace {IP} with the IP address of your iLOM service processor and {PW} with the iLOM`s root password:

ipmitool -v -H {IP} -U root -P {PW} fru | grep /SYS -A 5

In the output you will find all required informations to open a case.

FRU Device Description : /SYS (ID 20)
 Product Manufacturer  : SUN MICROSYSTEMS
 Product Name          : SUN FIRE X4200 M2   
 Product Part Number   : 602-3891-01
 Product Serial        : 0116SL46F2
 Product Extra         : 080020FFFFFFFFFFFFFF00144F7DB00A

Please make sure that all ipmi related modules on the server are loaded:

root@myserver:~ # lsmod | grep ipmi
ipmi_devintf           27024  0 
ipmi_si                53776  0 
ipmi_msghandler        47476  2 ipmi_devintf,ipmi_si

In case you just need the serial number of a machine you can use this command:

ipmitool -v -H {IP} -U root -P {PW} fru | grep /SYS -A 5 | grep Serial | awk '{print $4}'

written by phi.mic \\ tags:

Dez 04

 

All recent Linux kernels have a nice feature that recoverable machine check errors are logged into /var/log/mce.log. In normal case the file is empty and has zero bytes. The worse case is that the file has some content like this:

orac1:~ # cat /var/log/mcelog
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 4 northbridge TSC 5183f1d463078 
MISC c0090fff00000000 ADDR 170617520 
  Northbridge Chipkill ECC error
  Chipkill ECC syndrome = 5d31
       bit46 = corrected ecc error
       bit59 = misc error valid
       bit62 = error overflow (multiple errors)
  bus error 'local node response, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS dc18c0005d080a13 MCGSTATUS 0

This looks like an ECC memory problem. So i tried first to determine which memory module was faulty. For the SUN X4100/4200 machines there is a very nice tool named  herd, which allows to identify the location of the faulty RAM module. You can download this tool from the Sun Website.Select the download link with the "Tools and Driver CD", on the next page you have to choose "Tools and Driver" from the platform drop-down menu. Finally you have to choose "Linux-only Files from Tools and Drivers CD". Make sure that you have an ILOM firmware version, which is compatible with the downloaded herd release.

Extract the archive and install the binary file herd-[version].[distrib].[arch].rpm for your enterprise linux distribution (i use SLES10)

orac1:~ # tar -xjf linux_X4100M2_21.tar.bz2
orac1:~ # cd linux/tools/herd/
orac1:~ #  rpm -ivh herd-1.9.2-1.sl10.x86_64.rpm 
Preparing...                ########################################### [100%]
   1:herd                   ########################################### [100%]
herd                      0:off  1:off  2:off  3:on   4:off  5:on   6:off

Now you try to identify the faulty memory bar by using the ADDR value from the mce.log output as an input parameter for herd:

orac1:~ # herd -De 170617520
000170617520: Cpu Node 1, DIMM pair 0 (DIMMs 0 and 1)

The tool provides you now  the required information for replacing the faulty RAM module with a functional one.

written by phi.mic \\ tags: