Maintenance Mode for vRealize Operations Objects, Part 2

Part 1 of the “Maintenance Mode in vRealize Operations” blog series focused on a very basic scenario.

Simplified, we turned off data collection and made the object basically not existent from vROps point of view as the requirements were straight forward. To accomplish this, we used the vROps Maintenance feature.

In this part we consider a slightly more complex scenario. The use case for this blog post is:

“In case of ESXi host maintenance mode I do not want to receive any alerts for the affected Host System objects.” As always during assessment we collected following additional information:

  • There are only few ESXi hosts in maintenance at the same time
  • The team doing the maintenance in vCenter has read-only access to vROps UI
  • Automation could be used but is not mandatory
  • Metrics and properties need to be collected during the maintenance

Obviously, we cannot use the vROps Maintenance feature as this would violate the last requirement and the vCenter team would need additional permissions in vROps.

The ingredients for one possible solution are:

  • vCenter Maintenance Mode – Interface for the vCenter team to start the host maintenance
  • vROps Policy – place where we modify the behavior of vROps with regard to distinct objects
  • vROps Custom Group – place where we group objects to apply a vROps policy to them

The following picture describes (simplified) the model of Objects, Custom Groups and Policies in vROps:

Figure 1: Simplified object model – vROps objects and policies

Step 1 – Create a vROps Policy which implements the requirements.

In our case the requirement is to disable all alerts for Host Systems during ESXi maintenance hence we need a vROps Policy in which we are going to disable the corresponding alert definitions.

A new policy is being created based on an existing one, so only few changes will be needed to tweak the default behavior of vROps.

Figure 2: Creating new vROps Policy

In my test environment I am using my default policy as the base line for the new policy.

Figure 3: Setting the policy baseline

We go to the Alert Definitions section and disable the corresponding alert definitions for the Host System objects.

To reduce the list of alert definitions some filtering options may be applied. In the following picture you see filters for object type and the current state of the alert definitions. We do not care about disabled alert definitions as this is the is the desired state.

Figure 4: Alert definitions – display filter

The easiest way to get rid of all vROps Host System related alerts is to select all alert definitions in the list and disable them locally.

NOTE: This is a very simplified approach. In real life scenarios the requirements may be more complex and could result in alert definitions left enabled even during ESXi maintenance operations.

Figure 5: Disabling all Alert Definitions – 1
Figure 6: Disabling all Alert Definitions – 2

Step 2 – Create vROps Custom Group which will contain ESXi hosts being in Maintenance Mode

Following the simplified object model we need a dedicated vROps Custom Group to pool the objects we want to receive the new policy.

Figure 7: Creating a vROps Custom Group

Group Type” is just an arbitrary categorization to reflect your environment, once you have created a custom group, you cannot change the “Group Type” (at least not using the UI).

In Policy we select the newly created vROps Policy which disables the alert definitions.

Keep group membership up to date” ticked ensures that the membership will be re-validated every 20 minutes.

To tell vROps which object should be members of the group we need to specify appropriate membership criteria.

Figure 8: Specifying the vROps Custom Group – 1

In the previous figure we select the “Maintenance Mode” property of the Host System object type as the criteria to determine if a certain ESXi host system should become member of the custom group or not.

In the following picture we see a host being in vCenter Maintenance Mode.

Figure 10: vCenter – ESXi in Maintenance Mode

The “Preview” button let us check the configured membership criteria.

Figure 11: Host in Maintenance Mode and member of the custom group

Now the ESXi host system is member of the custom group and receives the new policy which disables all alert definitions for this particular object. In vROps one can see the current policy for any selected object in the upper right corner.

Figure 12: Host in Maintenance Mode receiving the new vROps Policy

Once the maintenance has been completed, vCenter admin disables the Maintenance Mode in vCenter and after up to 20 minutes the ESXi host is no longer member of our custom group and the previous (could be default) policy is being applied again.

Figure 13: Maintenance Mode ended – ESXi host receiving the original vROps Policy

As you probably noticed, one drawback of this method is a possible gap of max. 20 minutes between entering the vCenter Maintenance Mode for a host and vROps completing the re-evaluation of the custom group membership. That fact should be taken into account while designing the maintenance procedure.

In case automation plays a role in your environment and the policy needs to be applied immediately after entering the vCenter Maintenance Mode the corresponding vROps REST API call can be leveraged to programatically retrieve a list of ESXi hosts from vCenter and populate the custom group with the appropriate vROps objects. But this is another story…

Figure 14: vROps Custom Group REST API method

In Part 3 I will focus on use cases comprising additional objects types.