I had a Dell R815 host crash yesterday, with the following PSOD error message;
The system has found a problem on your machine and cannot continue.
LINT1 motherboard interrupt. This is a hardware problem; please contact your hardware vendor.
When I checked the system logs on the iDRAC, I could see a bus fatal error logged;
System Event Logs
|Critical||18:24:36||The watchdog timer expired.|
|Normal||18:16:37||An OEM diagnostic event has occurred.|
|Critical||18:16:36||A bus fatal error was detected on a component at bus 4 device 4 function 0.|
I ran the integrated hardware diagnostics using the system services on boot (F10) which confirmed these errors, but only because it read the system logs. I find this really annoying because if I had cleared the event logs prior to running the hardware diagnostics no errors would have been reported, and now I’m not sure if the hardware is faulty or not. Here are the reported errors;
Either way I can’t put it back into production without further analysis and need to find out what hardware component is located at bus 4 device 4 function 0 so that I can log a support call to Dell. It turns out this is really easy, using the lspci command which returns detailed info on all PCI devices.
lspci prints the device syntax in the [domain]:[bus]:[device].[function] format, so it’s easy to add the device information to grep the specific component without seeing all the other PCI devices. Here is what mine returned;
~ # lspci | grep '000:004:04.0' 000:004:04.0 Bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] ~ # lspci --help lspci -p --pciinfo Prints detailed info on all PCI devices -n --nolookup Don't look up PCI device names and info -d --dump Print hex dump of the full config space -v --verbose Verbose information
So now I know there was a problem with the PCI bridge and can log this to Dell in the hope that they simply replace the component under warranty.
37,238 total views, 7 views today