Found 1 New Failures | Asm Health Checker
If you want, I can:
(Invoking related search terms tool for People/Places/Shopping/etc. is not needed here.)
The alert " ASM Health Checker found 1 new failures " is a critical notification typically found in Oracle Automatic Storage Management (ASM) alert logs. It indicates that the GMON (Group Monitor)
process has detected an issue—often a disk failure or a forced dismount—that requires immediate attention What This Alert Means
This message usually appears alongside other ORA- errors and signals that ASM has identified a problem with the storage layer. Common triggers include: Disk Failures
: A physical disk or a storage path (LUN) has become inaccessible. Forced Dismounts
: The diskgroup has been forced offline because it can no longer maintain its required redundancy (e.g., a disk failure in an EXTERNAL REDUNDANCY Metadata Corruption
: Corruption in the ASM metadata blocks, which can happen during intensive operations like rebalancing. Configuration Issues asm health checker found 1 new failures
: Problems during the addition of new disks or voting file refreshes. Immediate Troubleshooting Steps Check the ASM Alert Log : Locate the alert log for your ASM instance (often in /u01/app/oracle/diag/asm/.../trace/alert_+ASM.log
). Look for the ORA- errors immediately preceding the "1 new failures" message to identify the specific disk or group affected. Verify Disk Status
: Run the following query in your ASM instance to check for offline or missing disks: name, group_number, path, state, header_status v$asm_disk; Use code with caution. Copied to clipboard Investigate the Incident : Oracle’s Fault Diagnosability Infrastructure
often generates an incident report when this occurs. Use the tool to view the incident details: show incident show tracefile (for the specific process like +ASM_rbal_xxxx.trc Monitor Rebalance/Repair : If a disk is just offline and you have redundancy, check the REPAIR_TIME
to see how long you have to fix the issue before ASM automatically drops the disk. Oracle Forums When to Take Urgent Action External Redundancy
: If your diskgroup uses external redundancy and a disk fails, the group will likely dismount immediately, potentially crashing your database. Intermediate States
: If your Clusterware (Grid Infrastructure) resources show an INTERMEDIATE If you want, I can:
state after this alert, the diskgroup may be partially available but failing to fully mount. trace file associated with this failure?
The alert "ASM Health Checker found 1 new failures" typically appears in your Oracle ASM alert logs when the Automatic Diagnostic Repository (ADR) health monitor detects a critical issue during a maintenance task, such as a diskgroup rebalance or a disk add operation. Understanding the Failure
When this message occurs, it indicates that a health check—either triggered automatically by an incident or run manually—has identified a problem that could compromise your storage. Common triggers include:
Disk Failgroup Issues: A diskgroup has fewer failure groups than recommended (e.g., fewer than 3 for normal redundancy).
Disk Status/Mount Failures: Disks are missing, offline, or have lost membership.
Metadata Corruption: Corruption found in the first 250 blocks of an ASM disk, which contain essential metadata.
Quorum Loss: The diskgroup cannot maintain a read quorum, often leading to an automatic dismount. How to Diagnose and Fix To resolve the failure, follow these diagnostic steps: Subject: [ALERT] ASM Health Checker Detected 1 New
When the ASM Health Checker reports "found 1 new failures," it usually indicates a critical disruption to the storage layer, often leading to a forced dismount of a disk group to prevent data corruption. This message is a summary alert that appears in the ASM Alert Log after a specific storage-related error has already occurred. Common Causes
Missing or Inaccessible Disks: The most frequent cause is that one or more disks in a group are no longer reachable due to hardware failure, storage connectivity issues, or OS-level changes.
Metadata Corruption: If ASM detects invalid block headers or internal inconsistencies in the metadata, it may trigger a failure and dismount the group.
Insufficient Quorum: In diskgroups with redundancy (Normal or High), if too many disks or a required "voting" disk (PST) become unavailable, the group cannot maintain a read quorum and will fail.
I/O Errors: Significant write failures or heartbeat timeouts to the PST (Physical Status Table) will prompt the health checker to record a new failure. Immediate Troubleshooting Steps 2 Automatic Storage Management - Oracle Help Center
Subject: [ALERT] ASM Health Checker Detected 1 New Failure - Immediate Investigation Required
For the disk mentioned in the failure detail:
# Check if device exists
ls -l /dev/oracleasm/disks/ (if using ASMLIB)
or
ls -l /dev/mapper/ | grep asm
Any SAN, multipath, or OS upgrade should trigger a manual health check:
asmcmd checkset -g DATA
When the ASM Health Checker reports "1 new failures," it means that during its last check, it detected one or more issues that could potentially impact the health and performance of your ASM storage. These issues could range from configuration problems, performance bottlenecks, to hardware failures.