Okt 28
Once you have setup a fix IP address for your iLOM SP and you have no more access to the web interface it is necessary to configure iLOM to use dhcp from the serial console.
First connect to the serial port of your SP with a kermit, HyperTerminal or your preferred application. The connection parameters are: 8N1 (eight data bits, no parity, no stop bit), 9600 baud, CTS/RTS (disable hardware flow control). Enter the following commands for setting up the iLOM network to use dhcp.
Sun(TM) Integrated Lights Out Manager
Version 2.0.2.16
Copyright 2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
> cd /SP/network
/SP/network
> set pendingipdiscovery=dhcp
Set 'pendingipdiscovery' to 'dhcp'
> set commitpending=true
> exit
written by phi.mic
\\ tags: iLOM
Feb 04
Basically it is a good idea to setup the iLOM network outside the public network. Up to this date iLOM has no feature to setup a firewall to prevent unauthorized login attempts from outside the world. In my case i put all my iLOMs in a seperate private network 10.5.1.x / 255.255.255.0. Usually iLOMs network factory settings are set to dhcp, but i decide to give the SP a fix IP address. Thus i have to reconfigure the iLOM network settings through the serial console. Below i will explain how to setup the network of the iLOM SP with a serial console connection.
First you have to establish a serial connection to the SER MGT connector (use the ethernet-to-serial connector which is shipped with the SUN machine). Be sure that all setting in minicom are correct (8N1, 9600 baud, Disable Hardware Flow Control, CTS / RTS) and login to iLOM with your root account and your password (the factory password for the iLOM SP is changeme). The following code sequence let you change the network settings to a fix IP address in a private network (in my example i will use 10.5.1.x / 24).
cd /SP/console
start
cd /SP/network
set pendingipaddress=10.5.1.111
set pendingipnetmask=255.255.255.0
set pendingipgateway=10.5.1.254
set commitpending=true
Thats all. Now you can use the assigned IP address to login on the iLOMs WebGUI or directly connect to iLOM via SSH.
References:
written by phi.mic
\\ tags: iLOM, SUN Fire X4200
Dez 16
To open a case at the SUN`s customer care center you need at least the System Name and the System Serial Number. But somtimes you don`t have access to the server room (e.g. alarm system). In such a case it is very easy to get the SN with the ipmi command line tool. Just enter the following command at your workstation and replace {IP} with the IP address of your iLOM service processor and {PW} with the iLOM`s root password:
ipmitool -v -H {IP} -U root -P {PW} fru | grep /SYS -A 5
In the output you will find all required informations to open a case.
FRU Device Description : /SYS (ID 20)
Product Manufacturer : SUN MICROSYSTEMS
Product Name : SUN FIRE X4200 M2
Product Part Number : 602-3891-01
Product Serial : 0116SL46F2
Product Extra : 080020FFFFFFFFFFFFFF00144F7DB00A
Please make sure that all ipmi related modules on the server are loaded:
root@myserver:~ # lsmod | grep ipmi
ipmi_devintf 27024 0
ipmi_si 53776 0
ipmi_msghandler 47476 2 ipmi_devintf,ipmi_si
In case you just need the serial number of a machine you can use this command:
ipmitool -v -H {IP} -U root -P {PW} fru | grep /SYS -A 5 | grep Serial | awk '{print $4}'
written by phi.mic
\\ tags: SUN Fire X4200
Dez 04
All recent Linux kernels have a nice feature that recoverable machine check errors are logged into /var/log/mce.log. In normal case the file is empty and has zero bytes. The worse case is that the file has some content like this:
orac1:~ # cat /var/log/mcelog
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 4 northbridge TSC 5183f1d463078
MISC c0090fff00000000 ADDR 170617520
Northbridge Chipkill ECC error
Chipkill ECC syndrome = 5d31
bit46 = corrected ecc error
bit59 = misc error valid
bit62 = error overflow (multiple errors)
bus error 'local node response, request didn't time out
generic read mem transaction
memory access, level generic'
STATUS dc18c0005d080a13 MCGSTATUS 0
This looks like an ECC memory problem. So i tried first to determine which memory module was faulty. For the SUN X4100/4200 machines there is a very nice tool named herd, which allows to identify the location of the faulty RAM module. You can download this tool from the Sun Website.Select the download link with the "Tools and Driver CD", on the next page you have to choose "Tools and Driver" from the platform drop-down menu. Finally you have to choose "Linux-only Files from Tools and Drivers CD". Make sure that you have an ILOM firmware version, which is compatible with the downloaded herd release.
Extract the archive and install the binary file herd-[version].[distrib].[arch].rpm for your enterprise linux distribution (i use SLES10)
orac1:~ # tar -xjf linux_X4100M2_21.tar.bz2
orac1:~ # cd linux/tools/herd/
orac1:~ # rpm -ivh herd-1.9.2-1.sl10.x86_64.rpm
Preparing... ########################################### [100%]
1:herd ########################################### [100%]
herd 0:off 1:off 2:off 3:on 4:off 5:on 6:off
Now you try to identify the faulty memory bar by using the ADDR value from the mce.log output as an input parameter for herd:
orac1:~ # herd -De 170617520
000170617520: Cpu Node 1, DIMM pair 0 (DIMMs 0 and 1)
The tool provides you now the required information for replacing the faulty RAM module with a functional one.
written by phi.mic
\\ tags: SUN Fire X4200