Part 1 of the “Maintenance Mode in vRealize Operations” blog series focused on a very basic scenario.
Simplified, we turned off data collection and made the object basically not existent from vROps point of view as the requirements were straight forward. To accomplish this, we used the vROps Maintenance feature.
In this part we consider a slightly more complex scenario. The use case for this blog post is:
“In case of ESXi host maintenance mode I do not want to receive any alerts for the affected Host System objects.” As always during assessment we collected following additional information:
- There are only few ESXi hosts in maintenance at the same time
- The team doing the maintenance in vCenter has read-only access to vROps UI
- Automation could be used but is not mandatory
- Metrics and properties need to be collected during the maintenance
Obviously, we cannot use the vROps Maintenance feature as this would violate the last requirement and the vCenter team would need additional permissions in vROps.
The ingredients for one possible solution are:
- vCenter Maintenance Mode – Interface for the vCenter team to start the host maintenance
- vROps Policy – place where we modify the behavior of vROps with regard to distinct objects
- vROps Custom Group – place where we group objects to apply a vROps policy to them
The following picture describes (simplified) the model of Objects, Custom Groups and Policies in vROps:
Step 1 – Create a vROps Policy which implements the requirements.
In our case the requirement is to disable all alerts for Host Systems during ESXi maintenance hence we need a vROps Policy in which we are going to disable the corresponding alert definitions.
A new policy is being created based on an existing one, so only few changes will be needed to tweak the default behavior of vROps.
In my test environment I am using my default policy as the base line for the new policy.
We go to the Alert Definitions section and disable the corresponding alert definitions for the Host System objects.
To reduce the list of alert definitions some filtering options may be applied. In the following picture you see filters for object type and the current state of the alert definitions. We do not care about disabled alert definitions as this is the is the desired state.
The easiest way to get rid of all vROps Host System related alerts is to select all alert definitions in the list and disable them locally.
NOTE: This is a very simplified approach. In real life scenarios the requirements may be more complex and could result in alert definitions left enabled even during ESXi maintenance operations.
Step 2 – Create vROps Custom Group which will contain ESXi hosts being in Maintenance Mode
Following the simplified object model we need a dedicated vROps Custom Group to pool the objects we want to receive the new policy.
“Group Type” is just an arbitrary categorization to reflect your environment, once you have created a custom group, you cannot change the “Group Type” (at least not using the UI).
In Policy we select the newly created vROps Policy which disables the alert definitions.
“Keep group membership up to date” ticked ensures that the membership will be re-validated every 20 minutes.
To tell vROps which object should be members of the group we need to specify appropriate membership criteria.
In the previous figure we select the “Maintenance Mode” property of the Host System object type as the criteria to determine if a certain ESXi host system should become member of the custom group or not.
In the following picture we see a host being in vCenter Maintenance Mode.
The “Preview” button let us check the configured membership criteria.
Now the ESXi host system is member of the custom group and receives the new policy which disables all alert definitions for this particular object. In vROps one can see the current policy for any selected object in the upper right corner.
Once the maintenance has been completed, vCenter admin disables the Maintenance Mode in vCenter and after up to 20 minutes the ESXi host is no longer member of our custom group and the previous (could be default) policy is being applied again.
As you probably noticed, one drawback of this method is a possible gap of max. 20 minutes between entering the vCenter Maintenance Mode for a host and vROps completing the re-evaluation of the custom group membership. That fact should be taken into account while designing the maintenance procedure.
In case automation plays a role in your environment and the policy needs to be applied immediately after entering the vCenter Maintenance Mode the corresponding vROps REST API call can be leveraged to programatically retrieve a list of ESXi hosts from vCenter and populate the custom group with the appropriate vROps objects. But this is another story…
In Part 3 I will focus on use cases comprising additional objects types.