PSOD : LINT1 motherboard interrupt

I had a Dell R815 host crash yesterday, with the following PSOD error message;

The system has found a problem on your machine and cannot continue.
LINT1 motherboard interrupt. This is a hardware problem; please contact your hardware vendor.

PSOD-LINT1

When I checked the system logs on the iDRAC, I could see a bus fatal error logged;

System Event Logs

SeverityTimeDescription
Critical18:24:36The watchdog timer expired.
Normal18:16:37An OEM diagnostic event has occurred.
Critical18:16:36A bus fatal error was detected on a component at bus 4 device 4 function 0.

I ran the integrated hardware diagnostics using the system services on boot (F10) which confirmed these errors, but only because it read the system logs. I find this really annoying because if I had cleared the event logs prior to running the hardware diagnostics no errors would have been reported, and now I’m not sure if the hardware is faulty or not. Here are the reported errors;

Watchdog-Sensor

PCIE-Fatal-Error

Either way I can’t put it back into production without further analysis and need to find out what hardware component is located at bus 4 device 4 function 0 so that I can log a support call to Dell. It turns out this is really easy, using the lspci command which returns detailed info on all PCI devices.

lspci prints the device syntax in the [domain]:[bus]:[device].[function] format, so it’s easy to add the device information to grep the specific component without seeing all the other PCI devices. Here is what mine returned;

lspci

~ # lspci | grep '000:004:04.0'
000:004:04.0 Bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane]

~ # lspci --help
lspci   -p --pciinfo   Prints detailed info on all PCI devices
        -n --nolookup  Don't look up PCI device names and info
        -d --dump      Print hex dump of the full config space
        -v --verbose   Verbose information

So now I know there was a problem with the PCI bridge and can log this to Dell in the hope that they simply replace the component under warranty.

38,048 total views, 5 views today

FIX : PSOD on hosts with AMD Opteron 62xx series processors

If you’re running ESX/ESXi on either HP or Dell hosts with AMD Opteron 62xx series processors and were affected by the PSOD issue, then you will be happy to know that both vendors have now released BIOS updates to address this. My understanding is that this was actually a problem with the AMD microcode rather than a VMware, HP or Dell issue.

I was affected by this using Dell PowerEdge R815 servers immediately after upgrading the BIOS on my hosts from a mix of version 2.8.2 and version 2.9.0 to version 3.0.4. The workaround up till now was to downgrade the BIOS version on all hosts back to version 2.8.2.

Dell PowerEdge R815 Hosts

ItemValue
Processor ModelAMD Opteron(tm) Processor 6276
Processor Speed2.3 GHz
Processor Sockets4
Processor Cores per Socket16
Logical Processors64
Memory256 GB

Here are the links the vendor advisories and updates;

Interestingly Dell have only flagged the importance of this upgrade as “Recommended” – not sure about you, but I quite like my hosts to be up and running!

18,478 total views, 1 views today