Veritas Resiliency Platform 1.1

Table: Predefined risks lists the predefined risks available in Resiliency Platform. These risks are reflected in the current risk report and the historical risk report.

Table: Predefined risks

Risks

Description

Risk detection time

Risk type

Affected operation

Fix if violated

New VM added to replication storage

Checks if a virtual machine that is added to a consistency group on a primary site, is not a part of the resiliency group.

5 minutes

Error

  • Migrate

  • Takeover

  • Rehearse

Add the virtual machine to the resiliency group.

Replication lag exceeding threshold

Checks if the replication lag exceeds the thresholds that are defined by the user for each resiliency group.

5 minutes

Warning

  • Migrate

  • Takeover

Contact the appropriate administrator

Replication state broken/critical

Checks if the replication is not working or is in a critical condition for each resiliency group.

5 minutes

Error

  • Migrate

  • Takeover

Contact the enclosure vendor.

Remote mount point already mounted

Checks if the mount point is not available for mounting on target site for any of the following reasons:

  • Mount point is already mounted.

  • Mount point is being used by other assets.

  • Native (ext3, ext4,NTFS ): 30 minutes

  • Virtualization (VMFS, NFS): 6 hours

Warning

  • Migrate

  • Takeover

Unmount the mount point that is already mounted or is being used by other assets.

Disk utilization critical

Checks if at least 80% of the disk capacity is being utilized. The risk is generated for all the resiliency groups associated with that particular file system.

  • Native (ext3, ext4,NTFS ): 30 minutes

  • Virtualization (VMFS, NFS): 6 hours

Warning

  • Migrate

  • Takeover

  • Rehearse

Delete or move some files or uninstall some non-critical applications to free up some disk space.

Control host not reachable

Checks if the discovery daemon is down on the Control Host.

15 minutes

Error

  • Migrate

Resolve the discovery daemon issue.

ESX not reachable

Checks if the ESX server is in a disconnected state.

5 minutes

Error

  • On primary site: start or stop operations

  • On secondary site: migrate or takeover operations

Resolve the ESX server connection issue.

vCenter Server not reachable

Checks if the virtualization server is unreachable or if the password for the virtualization server has changed.

5 minutes

Error

  • On primary site: start or stop operations

  • On secondary site: migrate or takeover operations

Resolve the virtualization server connection issue.

In case of a password change, resolve the password issue.

Insufficient compute resources on failover target

Checks if there are insufficient CPU resources on failover target in a virtual environment.

6 hours

Warning

  • Migrate

  • Takeover

Reduce the number of CPUs assigned to the virtual machines on the primary site to match the available CPU resources on failover target.

Table: Other risks describes some risks that are displayed in Resiliency Platform console, but these risks are not reflected in the risk reports.

Table: Other risks

Risk

Description

HOST_SFMH_REINSTALLED

The host is disconnected. The probable cause is that the host has been reinstalled. Changes you make after this condition are not reflected on the Resiliency Manager. To correct this issue, remove and re-add this host to the Infrastructure Management Server (IMS).

HOST_DISCONNECTED_MAC_CHANGED

The host is disconnected. The probable cause is that the media access code (MAC) address of host has changed. Changes you make after this condition are not reflected on the Resiliency Manager. To correct this issue, remove and re-add this host to the Infrastructure Management Server (IMS).

VMWARE_DISCOVERY_FAILED

VMware discovery failed.

FS_FILESYSTEM_FULL

The file system is at 100% usage.

Veritas Resiliency Platform 1.1

Table: Predefined risks lists the predefined risks available in Resiliency Platform. These risks are reflected in the current risk report and the historical risk report.

Table: Predefined risks

Risks

Description

Risk detection time

Risk type

Affected operation

Fix if violated

New VM added to replication storage

Checks if a virtual machine that is added to a consistency group on a primary site, is not a part of the resiliency group.

5 minutes

Error

  • Migrate

  • Takeover

  • Rehearse

Add the virtual machine to the resiliency group.

Replication lag exceeding threshold

Checks if the replication lag exceeds the thresholds that are defined by the user for each resiliency group.

5 minutes

Warning

  • Migrate

  • Takeover

Contact the appropriate administrator

Replication state broken/critical

Checks if the replication is not working or is in a critical condition for each resiliency group.

5 minutes

Error

  • Migrate

  • Takeover

Contact the enclosure vendor.

Remote mount point already mounted

Checks if the mount point is not available for mounting on target site for any of the following reasons:

  • Mount point is already mounted.

  • Mount point is being used by other assets.

  • Native (ext3, ext4,NTFS ): 30 minutes

  • Virtualization (VMFS, NFS): 6 hours

Warning

  • Migrate

  • Takeover

Unmount the mount point that is already mounted or is being used by other assets.

Disk utilization critical

Checks if at least 80% of the disk capacity is being utilized. The risk is generated for all the resiliency groups associated with that particular file system.

  • Native (ext3, ext4,NTFS ): 30 minutes

  • Virtualization (VMFS, NFS): 6 hours

Warning

  • Migrate

  • Takeover

  • Rehearse

Delete or move some files or uninstall some non-critical applications to free up some disk space.

Control host not reachable

Checks if the discovery daemon is down on the Control Host.

15 minutes

Error

  • Migrate

Resolve the discovery daemon issue.

ESX not reachable

Checks if the ESX server is in a disconnected state.

5 minutes

Error

  • On primary site: start or stop operations

  • On secondary site: migrate or takeover operations

Resolve the ESX server connection issue.

vCenter Server not reachable

Checks if the virtualization server is unreachable or if the password for the virtualization server has changed.

5 minutes

Error

  • On primary site: start or stop operations

  • On secondary site: migrate or takeover operations

Resolve the virtualization server connection issue.

In case of a password change, resolve the password issue.

Insufficient compute resources on failover target

Checks if there are insufficient CPU resources on failover target in a virtual environment.

6 hours

Warning

  • Migrate

  • Takeover

Reduce the number of CPUs assigned to the virtual machines on the primary site to match the available CPU resources on failover target.

Table: Other risks describes some risks that are displayed in Resiliency Platform console, but these risks are not reflected in the risk reports.

Table: Other risks

Risk

Description

HOST_SFMH_REINSTALLED

The host is disconnected. The probable cause is that the host has been reinstalled. Changes you make after this condition are not reflected on the Resiliency Manager. To correct this issue, remove and re-add this host to the Infrastructure Management Server (IMS).

HOST_DISCONNECTED_MAC_CHANGED

The host is disconnected. The probable cause is that the media access code (MAC) address of host has changed. Changes you make after this condition are not reflected on the Resiliency Manager. To correct this issue, remove and re-add this host to the Infrastructure Management Server (IMS).

VMWARE_DISCOVERY_FAILED

VMware discovery failed.

FS_FILESYSTEM_FULL

The file system is at 100% usage.