A trouble call reported an old server running Solaris 10 with a full root partition.
Looking around I noticed the FMD log files were quite large and the active file was growing. To ease the space condition, I deleted the rotated files (ie .0, .1, etc.) and started looking into what was the problem.
The Solaris Fault Management Facility was created to provide a self-healing capability. It through the fmd daemon monitors various aspects of system health and as in this case, logged many messages for system issues.
The first obvious check was to use the fmadm faulty command to see if anything was flagged as faulty. In this situation; there was a bad dimm.
This wasn’t enough to fill log files so I had a look at /var/fm/fmd file and it had several entries for a processor.
The fmstat command which will report statistics logged by fmd and it’s modules confirmed the log activity.
Since it was an old server with no warranty; the hardware people were notified to look at the server and retire it if they couldn’t repair it.