Salt Extension Modules for VMware – Quick How-To

My fellow colleague Vincent Riccio described here in his blog post the open-source SaltStack Modules that provide hooks into components such as VMware Cloud on AWS, NSX-T, and vSphere.
These modules are a fantastic way to implement prescriptive configuration management across various VMware infrastructure components using the same solution as you should use for software and configuration management of your operating systems and applications – vRealize Automation SaltStack Config.

In this blog post, I will show you how easy it is to install and use the Salt Extension Modules for VMware using the vSphere vCenter module as an example.

Pre-Requisites

I have modified the following Quickstart to fit into my SaltStack setup.

The components running in my lab for this quick demo are:

  • vRealize Automation SaltStack Config instance
  • SaltStack minion on a Linux VM

The next picture shows my Salt minion running in a CentOS 8 Linux. This minion will be the dedicated minion I will use to execute the VMware modules.

Figure 1: Salt minion for the extension modules.

Configuration Steps

Step 1: We need to provide basic information to let SaltStack connect to the vCenter Server. Usually, we use Salt pillars to specify such configuration variables. In the next picture, you see the pillar I have created for my vCenter instance.

Figure 2: Salt pillar containg vCenter login information.

Please be aware that the user name is case sensitive.

Step 2: Update the target, in my use case the dedicated minion, to include the data in this pillar.

Figure 3: Updating the target with the pillar data.

Step 3: With the following command executed on the target Salt minion we can check if the pillar has been applied and the minion has all the needed information.

[root@tk-lin-131 ~]# salt-call pillar.items
local:
    ----------
    vmware_config:
        ----------
        host:
            vc-demo.xxx.xxx
        password:
            xxxxxxx
        user:
            Administrator@demo.local

Step 4: Install the Salt Extension Modules for VMware on the minion with the following command as described in the Quickstart.

$ salt-call pip.install saltext.vmware

In case you receive an error pointing to an outdated pip version, simply run pip upgrade on the minion:

python3 -m pip install --upgrade pip

Step 5: Check if the modules are available on your minion (the output is truncated to display only the relevant modules):

[root@tk-lin-131 ~]# salt-call --local sys.list_modules
local:
    - nsxt_compute_manager
    - nsxt_ip_blocks
    - nsxt_ip_pools
    - nsxt_license
    - nsxt_manager
    - nsxt_policy_segment
    - nsxt_policy_tier0
    - nsxt_policy_tier1
    - nsxt_transport_node
    - nsxt_transport_node_profiles
    - nsxt_transport_zone
    - nsxt_uplink_profiles
    - vmc_dhcp_profiles
    - vmc_direct_connect
    - vmc_distributed_firewall_rules
    - vmc_dns_forwarder
    - vmc_nat_rules
    - vmc_networks
    - vmc_public_ip
    - vmc_sddc
    - vmc_sddc_host
    - vmc_security_groups
    - vmc_security_rules
    - vmc_vpn_statistics
    - vmware_cluster
    - vmware_cluster_drs
    - vmware_cluster_ha
    - vmware_datacenter
    - vmware_datastore
    - vmware_dvswitch
    - vmware_esxi
    - vmware_folder
    - vmware_license_mgr
    - vmware_tag
    - vmware_vm

Step 6: Check if the minion is successfully connecting to the vCenter specified in the pillar and if the modules are working as expected (output truncated for visibility):

[root@tk-lin-131 ~
]# salt-call vmware_datacenter.list
local:
    - Demo-Datacenter
[root@tk-lin-131 ~
]# salt-call vmware_cluster.get cluster_name=HP-Cluster datacenter_name=Demo-Datacenter
local:
    ----------
    drs:
        ----------
        advanced_settings:
            ----------
        default_vm_behavior:
            fullyAutomated
        enable_vm_behavior_overrides:
            True
        enabled:
            True
        vmotion_rate: 3
    drs_enabled:
        True 

Step 7: Now we can start creating Salt state files which will be integral and prescriptive part of our configuration management.

The following state file is just a very simple example. It configures few security settings on all my ESXi hosts in the vCenter we have specified in the pillar in step 1.

set_sec_config_max_days:
  module.run:
    - name: vmware_esxi.get_advanced_config
    - config_name: Security.PasswordMaxDays
    - config_value: 99998

set_sec_config_unlock_time:
  module.run:
    - name: vmware_esxi.get_advanced_config
    - config_name: Security.AccountUnlockTime
    - config_value: 899

Step 8: We can apply this state file to our dedicated minion using e.g. a Salt job as shown in the next picture.

Figure 4: Applying the state file as Salt job.

Step 9: In the last step we can finally check the outcome. We can use the corresponding get command on our minion or just review the settings in vCenter.

[root@tk-lin-131 ~]# salt-call vmware_esxi.get_advanced_config config_name=Security
local:
    ----------
    hp-demo01.xxx.yyy:
        ----------
        Security.AccountLockFailures:
            5
        Security.AccountUnlockTime:
            899
        Security.PasswordHistory:
            0
        Security.PasswordMaxDays:
            99998
        Security.PasswordQualityControl:
            retry=3 min=disabled,disabled,disabled,7,7
Figure 5: Advanced configuration of an ESXi host in vCenter.
Some final notes

Please note that after changing the state file it may take Salt a few seconds to reflect that change in the virtual file system. If you run a Salt job immediately after changing the state file, Salt may use the “old version”.

In my example, I have used an execution module. Usually, you would use a state module to check a setting and only apply a configuration if there is a deviation. At the moment of writing this post, the ESXi state module does not support checking Advanced Configuration. Since this is an Open Source module anyone can try to implement it:-)

Stay safe

Thomas – https://twitter.com/ThomasKopton

Exclude “Aggregate” Instanced Metric using vRealize Operations Super Metric

As you know vRealize Operations is collecting tons of various metrics. Some of these metrics are so-called “Instanced Metrics” and disabled in the default configuration in newer vROps versions. A list of disabled instanced metrics for e.g. Virtual Machine object type is available here:

https://docs.vmware.com/en/vRealize-Operations/8.6/com.vmware.vcom.metrics.doc/GUID-1322F5A4-DA1D-481F-BBEA-99B228E96AF2.html#disabled-instanced-metrics-18

If you need any of those metrics, you can enable them in your vRealize Operations Policy.

Figure 01: Enabling disabled instanced metric in vROps policy.

As you can see in the previous picture, there is an option to specify instances you would like to include or exclude. In my example, I am excluding the CPU (or CPUs) containing “1” in the instanced metric name. Yes, it does not make any sense, it is just an example:-)

Problem statement

In addition or as a replacement for some of the disabled instanced metrics vRealize Operations provides the “Aggregate of all instances” metric, like in this example for Virtual Disk metrics.

Figure 02: “Aggregate of all instances” metric.

The problem now is that in certain situations where you would like to evaluate the instanced metrics to find the maximum, minimum, etc. the aggregated metric may also be taken into the equation, like in views or super metrics.

Use case

One of my customers described a very interesting and important use case.

“I want to determine the highest average write request size”.

One logical way would be to use vRealize Operations Super Metric and create a formula like this one:

max({This Resource: Virtual Disk|Average Write request size (bytes)}) 

Unfortunately, that approach does not work.

As described in the “Problem statement” this calculation includes the aggregated metric, “VirtualDisk|Aggregate of all Instances”, which leads to a wrong result.

Possible solution

Please be aware that this is ONE possible solution with one drawback that I will explain at the end.

The approach is to exclude the aggregated metric from the formula.

What we cannot do, or at least I do not know how is to exclude a metric based on the instance name.

What we can do is to leverage the assumption that the aggregate will be usually greater than any single instance as it is the sum of all instances. And this is the mentioned drawback. The approach works only when the following assumption is true:

  • count of instances is > 1
  • at least 2 instances have a value > 0 at the occurrence of super metric evaluation

I am working on an improved version of the formula to get rid of the assumption. For the time being this is what is working taking the mentioned assumption into account:

max(${this, attribute=virtualDisk|writeIOSize_latest, where=($value < ${metric=virtualDisk:Aggregate of all instances|writeIOSize_latest})})

This formula is evaluating only the metrics with values < the value of the aggregated metric.

Figure 03: Highest average write request size.

Outlook

The improved formula will include some if-then-else statements.

Stay safe

Thomas – https://twitter.com/ThomasKopton

Quick Tip – Programmatically Update vRealize Log Insight Webhook Token

The Webhook feature in vRealize Log Insight is a great way to execute automation tasks, push elements in a RabbitMQ message queue or start any other REST operation.

Many endpoints providing such REST methods require a token-based authentication, like the well-known Bearer Token. vRealize Automation is one example of such endpoints.

It is pretty easy to specify that kind of authentication in vRealize Loginsight Webhook, the only thing you need to do is to add the Authorization header and set its value to Bearer xyz... .

Figure 01: Authorization header in vRLI webhook.

The problem with such tokens is their expiration. Usually, such tokens are valid only for a certain time period and need to be refreshed.

How to get a new token from vRealize Automation is not the subject of this post, we assume we have a new token, let’s say it is abcd1234. Now we need to update that value in vRLI.

We use the vRealize Log Insight REST API to update the Webhook configuration which contains the headers.

Before we can do that, we need the Bearer Token for the vRealize Log Insight REST API. This is the curl command to retrieve the token:

curl --location --request POST 'https://$VRLI-FQDN/api/v2/sessions' --header 'Accept-Encoding: application/json' --header 'Content-Type: application/json' --data-raw '{ "username": "admin", "password": "secret", "provider": "Local"}'

The response contains the token we use for any other subsequent REST API call (I have changed the token for better visibility):

{
    "userId": "2338ddb2-xxx-yyy",
    "sessionId": "123456789assdfg",
    "ttl": 1800
}

Now we have everything we need to update the Webhook configuration to include a new vRealize Automation token, abcd1234. This is a two-step process.

First, we need the ID of our Webhook. This is the REST call to retrieve all webhooks:

curl --location --request GET 'https://$VRLI-FQDN/api/v2/notification/webhook' \
--header 'Accept-Encoding: application/json' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer 123456789assdfg' \

From the (truncated) response we extract the ID of the Webhook we would like to modify. Please be aware that the output JSON will contain separate sections for every Webhook in your vRealize Log Insight:

{ "id": "96211daa-4391-42bd-b972-52689b7eb540", "URLs": [ "https://$VRLI-FQDN/codestream/api/pipelines/db1259ea-e4ea-4a91-89fc-70a3b5a2004b/executions" ], "destinationApp": "custom", "contentType": "JSON",

Second, we can now use a PUT REST call to update the webhook with the new Bearer Token for vRealize Automation.

curl --location --request PUT 'https://$VRLI-FQDN/api/v2/notification/webhook/432be7da-9876-4c9f-a11a-bb1fe131b2c4'--header 'Accept-Encoding: application/json' --header 'Content-Type: application/json' --header 'Authorization: Bearer 123456789assdfg' --data-raw '{ $BODY_FROM_PREVIOUS_RESPONSE_FOR_THE_CORRESPONDING_WEBHOOK_ID }'

The REST body is basically the content corresponding Webkook ID of the JSON output we retrieved in the Step 1 response. We just need to replace the token here:

{ "id": "432be7da-9876-4c9f-a11a-bb1fe131b2c4", "URLs": [ "https://{ "id": "96211daa-4391-42bd-b972-52689b7eb540", "URLs": [ "https://$VRLI-FQDN/codestream/api/pipelines/db1259ea-e4ea-4a91-89fc-70a3b5a2004b/executions" ], "destinationApp": "custom", "contentType": "JSON",/codestream/api/pipelines/b12c7cbd-6d83-4bef-8b31-7bb653ee01aa/executions" ], "destinationApp": "custom", "contentType": "JSON", "payload": "{\n \"comments\": \"Starting pipeline using REST - from vRLI\",\n \"input\": {\n \"messages\": \"${messages}\",\n \"TriggeredAt\": \"${TriggeredAt}\",\n \"my_systemID\": \"${ vmw_vr_ops_id}\",\n \"SourceInfo\": \"${SourceInfo}\",\n \"HitOperator\": \"${HitOperator}\",\n \"Info\": \"${Info}\"\n}\n}", "name": "tkopton-CS-Pipeline-Test", "headers": "{\"Content-Type\":\"application/json\",\"Action\":\"POST\",\"Authorization-User\":\"undefined\",\"Authorization-Password\":\"undefined\",\"Authorization\":\"Bearer abcd1234\"}", "acceptCert": false, "creatorEmail": "xxxxxxx"}

Replacing the token in the vRealize Log Insight is that simple.

Stay safe.

Thomas – https://twitter.com/ThomasKopton

vCenter Events as vRealize Operations Alerts using vRealize Log Insight

As you probably know vRealize Operations provides several symptom definitions based on message events as part of the vCenter Solution content OOTB. You can see some of them in the next picture.

Figure 1: vCenter adapter message event symptoms.

These events are used in alert definitions to raise vReaalize Operations alarms any time one of those events is triggered in any of the managed vCenter instances.

If you take a look into vCenter events in the Monitoring tab or check the available events as presented by the VMware Event Broker Appliance (VEBA) integration, you will see that there are tons of other events you may want to use to raise alerts.

Figure 2: vCenter events in VEBA plugin.

Unfortunately, this is not always as easy as creating a new message event symptom definition in vROps. Not every event is intercepted by vRealize Operations.

Now, you could of course use VEBA to run functions triggered by such events and let the functions raise alerts, create tickets, etc. This is definitely a great option and how to do that using VEBA functions and vROps is something I am planning to describe in an upcoming blog post. But there are also other ways to achieve that.

If you run vRealize Log Insight integrated with vRealize Operations in your environment, and this is a highly recommended setup, you have another, very easy option, to raise alerts on any available vCenter event, as long as that event is logged by the vCenter instance. That should be the case for all or at least the majority of the events.

In the next picture, you see all the various events I have received in my vRLI coming from vCenter in the last 48 hours. For a better visibility, I have excluded all events generated by vim.event.eventex.

Figure 3: vCenter events in vRealize Log Insight.

To search, filter, and display such events using their type I have created the following extracted field in vRLI:

Figure 4: vRealize Log Insight extracted field for vCenter events.

This extracted field makes it now easy to create alert definitions in vRLI.

Let us assume our use case is: “I need an alert every time any vSphere cluster configuration has been changed.

The corresponding event in vCenter is created by vim.event.ClusterReconfiguredEvent and send to vRLI as a log message

And this is the corresponding log message in vRLI after I have changed the DRS configuration of one of my clusters.

Figure 5: ClusterReconfiguredEvent message in vRLI.

To get such events as an alarm in vRealize Operations in general we need two things:

  • vRealize Operations integration in vRealize Log Insight. With that integration vRLI is capable of mapping vCenter objects that are sources of messages to their objects in vROps. With this feature vRLI alarms can be forwarded to vROps and attached to exactly that object which is the original source of the message received by vRLI.
  • Alarm definition in vRealize Log Inisght that will be triggered every time an event of type vim.event.ClusterReconfiguredEvent has been received and this alarm will be forwarded to vROps. For this alert definitions we will use the extracted field described in figure 4.

But there is still a little more work we need to do to implement a solution that really fulfills our requirement: get an alert every time a cluster configuration change happened.

Let us assume the following situation. Someone is changing the configuration of one or several clusters very frequently. Our vRLI alert definition looks like shown in the next picture.

Figure 6: First alert definition in vRealize Log Insight.

And if we now run a query on this alert definition we will see that vRLI is properly triggering the alarms. In the picture, we see the three alarms raised because of three changes during 10 minutes.

Figure 7: Event messages triggering alarms in vRLI.

The problem with the vROps integration is, that the first alarm will be properly forwarded to vROps and will raise an alarm on the vCenter instance but any subsequent alarm coming in will not be reflected in vROps as long as the first alarm is still in the “open” state. We see the first alarm in vROps in the next figure.

Figure 8: First alarm in vRealize Operations.

This behavior is due to the same notification event text for every alarm. In that case, vROps just assumes that the next occurrence is reporting the same issue thus there is no need to raise another, duplicate alarm. In our case the notification event text is the name of the alarm as defined in the vRLI alert definition: tkopton-ClusterConfigChanged.

To change this behavior we need to include unique information for every alarm in the alarm name.

What we can do is customize the alert name by including a field or an extracted field in the format ${field-name}.

The challenge is to find such unique information in the event log message. Let’s see what we have. This is a sample event message as received in vRLI:

2021-11-14T10:18:06.795866+00:00 vcenter01 vpxd 7888 - -  Event [9150192] [1-1] [2021-11-14T10:18:06.795507Z] [vim.event.ClusterReconfiguredEvent] [info] [VSPHERE.LOCAL\Administrator] [Datacenter-01] [9150191] [Reconfigured cluster CL01 in datacenter Datacenter-01 
 
 Modified: 
 
configurationEx.drsConfig.enableVmBehaviorOverrides: false -> true; 

configurationEx.proactiveDrsConfig.enabled: false -> true; 

 Added: 
 
 Deleted: 
 
]

It looks like every event has a unique event ID – the key property as described in the vSphere API documentation. I have created an extracted field for the event ID:

Figure 9: vRLI extracted field for vCenter EventID.

This extracted field can be now used as part of the name in the alert definition, which will make every occurrence unique in vROps. In the next picture, you can see the modified alert definition in vRLI.

Figure 10: Final alert definition in vRealize Log Insight.

Let’s do some vSphere cluster reconfigurations.

Figure 11: New event messages triggering alarms in vRLI.

And this is how it looks like in vROps after vRLI forwarded these alarms to vRealize Operations. First, we check the symptoms, see the next picture.

Figure 12: Notification event symptoms in vRealize Operations.

And here we see the corresponding alarms in vROps.

Figure 13: New alarms in vRealize Operations.

With these alarms, you could now create vROps notifications, start webhook triggered actions, parse the content and automate the remediation. Yes, especially around the alert name in vRLI using the extracted field we still have some room for improvement but the approach described here is sufficient for many use cases I have worked with.

Have fun implementing your use cases.

Stay safe.

Thomas – https://twitter.com/ThomasKopton

vRealize Operations AMQP Integration using Webhooks

With the version 8.4 vRealize Operations introduced the Webhook Outbound Plugin feature. This new Webhook outbound plugin works without any additional software, the Webhook Shim server becomes obsolet.

In this post, I will explain how to integrate vRealize Operations with an AMQP system. For this exercise, I have deployed a RabbitMQ server but the concept should be the same for any AMQP implementation.

AMQP Basic Concept

Without going into details of AMQP the very basic concept is to provide a queue for producers and consumers. The producers can put items into the queue and consumers can pick up these items and do whatever they are supposed to do with those items.

Items may be e.g. messages, therefore Advanced Message Queuing Protocol or AMQP. One of the most known implementations of AMQP is RabbitMQ.

Figure 1: Basic AMQP concept

In the context of vRealize Operations, we could consider vROps the producer and triggered alerts the items we could put into a queue to let consumers retrieve the items and do some work.

RabbitMQ Exchange and Queue

As first step I have configured my RabbitMQ instance with three queues:

  • vrli.alert.open – for vRealize Log Insight alerts
  • vrops.alert.open – for new vROps alerts
  • vrops.alert.close -for canceled vROps alerts

As shown in the next picture all three queues are using the amq.direct exchange.

Figure 2: RabbitMQ queue and exchange concept

The actual binding between exchange and queue is based on a routing key, as shown in the next picture for the vrops.alert.open queue.

Figure 3: Exchange-queue binding example

This routing key will be used later on in the payload to route the message to the right queue.

Webhook Outbound Plugin

The new Webhook Outbound Plugin provides a generic way to integrate (almost) any REST API endpoint without the need for a webhook shim server.

The configuration, as with any outbound plugin, requires the creation of an instance. The config of the instance for RaabbitMQ integration is displayed in the following picture. If you are using other exchanges, hosts, etc. in your RAbbitMQ instance you will need to adjust the URL accordingly.

Figure 4: Webhook Outbound Plugin instance configuration for RabbitMQ

NOTE: The test will fail as the test routine does not provide the payload as expected by the publish REST API method. You still need to provide working credentials, ignore the test error message and save the instance.

Payload Template

Payload Templates are the next building block in the concept. Using the new Payload Templates, you can configure the desired outbound payload granularly down to a single metric level. The following picture shows an example of the payload configuration used for the message reflecting a new open alert in vRealize Operations.

Figure 5: Payload template for vROps open alert

Important are especially the “routing key” and the “payload” parts. The first one ensures that the message will be published to the right queue and the payload is what the consumer is expecting. In my use case, it is just an example containing only a portion of available data.

Both payload template examples, one for new (open) alerts and one for canceled (close) alerts are available on the VMware Code page:

VMware Code – Sample Exchange

Notification

The last step is to create appropriate vRealize Operations Alert Notifications which will be triggered as soon as specified criteria are met and configure the outbound instance and the payload for RabbitMQ as shown in the next picture.

Figure 6: Notification settings

And this is the result, messages published to all three queues.

Figure 7: Queues with messages

An example message looks like this one.

Figure 8: vROps open alert message

The missing part now is the consumers. It could be a vRealize Orchestrator workflow subscribed to a queue or any other consumer processing AMQP messages. Maybe something for a next blog post?

Stay safe.

Thomas – https://twitter.com/ThomasKopton

Wavefront by VMware and Log Ingestion

Motivation

In my last post, I have described how to ingest power consumption data provided by the vzlogger project into vRealize Log Insight and how to extract the actual metrics from the log message.

That setup is working since days as expected but the primary use case for vRealize Log Insight is intelligent logging and analytics of structured, semi-structured and unstructured messages.

In my electric power consumption use case I am interested in collecting and analyzing time-series data in real-time.

And this is exactly the use case for Wavefront by VMware:

https://tanzu.vmware.com/observability

Logical Design

The logical design id fairly simple, only three components are needed:

  • Wavefront SaaS access
  • Wavefront Proxy – your on-premises part of the Wavefront solution
  • Log data source – in my scenario it is still by Raspbian providing the data via rsyslog
Figure 01: Logical design

In my setup is rsyslog sending the messages every 5 seconds to the proxy and the proxy is sending the extracted metric(s) to the SaaS Wavefront endpoint.

The actual installation and configuration consists of three steps:

  • Wavefront Proxy installation
  • Wavefront Proxy configuration
  • rsyslog configuration

Wavefront Proxy

The installation of the proxy is basically one simple step. In the Wavefront UI a click on “ADD NEW PROXY” shows details on how to install the proxy as:

  • Linux installation
  • Windows installation
  • Mac installation
  • Docker image
Figure 02: Proxy installation – “Add New Proxy”

I have installed my proxy on a Ubuntu 18.04 server VM using the command provided by Wavefront UI which includes an auto-generated API token.

Figure 03: Proxy installation on Linux

The proxy automatically connects to the Wavefront service and is ready for forwarding metrics to the cloud.

What is missing now, is the proper configuration to receive log messages and extract the actual time series data I would like to have collected in Wavefront.

Log Ingestion Config

Now it’s time to configure the required integration. Wavefront supports a large number of integrations. One of them is the Log Data Integration.

Figure 04: Log Data Integration

If you have never worked with e.g. Logstash or any other solution using grok patterns, that might be de most challenging part of the setup.

First part of the proxy config is easy, we enable the proxy to listen for log messages coming in either via Filebeat or raw TCP. This is the corresponding part of my proxy config file

/etc/wavefront/wavefront-proxy/wavefront.conf

#### LOGS TO METRICS SETTINGS #####
## Port on which to listen for FileBeat data (Lumberjack protocol). Default: none
filebeatPort=5044
## Port on which to listen for raw logs data (TCP and HTTP). Default: none
rawLogsPort=5045
## Maximum line length for received raw logs (Default: 4096)
rawLogsMaxReceivedLength=4096
## Maximum allowed request size (in bytes) for incoming HTTP requests with raw logs (Default: 16MB)
rawLogsHttpBufferSize=16777216
## Location of the `logsingestion.yaml` configuration file
logsIngestionConfigFile=/etc/wavefront/wavefront-proxy/logsingestion.yaml

The second part is to have a working log insgestion config file, in my case:

/etc/wavefront/wavefront-proxy/logsingestion.yaml

“Working” means not only accepted by the proxy but also and even more important, extracting the metric(s) we want to forward to Wavefront.

At the moment I am interested in the current electric power consumption. This is the red highlighted value in the set of the vzlogger messages:

Figure 05: vzlogger message containing the metric

Time for the grok patterns. The task is to extract the electric power consumption value. To test grok patterns with my log messages as input I have used:

http://grokdebug.herokuapp.com/

Finally I came up with this pattern and the log ingestion proxy configuration.

Pattern:

[%{MONTH} %{MONTHDAY} %{TIME}][%{WORD}] %{WORD}: id=1-0:1.7.0%{GREEDYDATA} value=%{BASE16FLOAT:currConsumption} ts=%{NUMBER}

Resulting config, /etc/wavefront/wavefront-proxy/logsingestion.yaml

Please not, it is yaml, so you need to preserve the 2 spaces for indentation.

aggregationIntervalSeconds: 5  # Metrics are aggregated and sent at this interval
gauges:
  - pattern: '\[%{MONTH} %{MONTHDAY} %{TIME}\]\[%{WORD}\] %{WORD}: id=1-0:1.7.0%{GREEDYDATA} value\=%{BASE16FLOAT:currConsumption} ts=%{NUMBER}'
    metricName: 'myCurrConsumption'
    valueLabel: 'currConsumption'

rsyslog Config

The last step is to re-configure rsyslog to send the messages to the proxy or as I did to the Wavefront proxy and to vRealize Log Insight. This is my rsyslog config file after adding the second target.

$ModLoad imfile

$InputFileName /var/log/vzlogger/vzlogger.log
$InputFileTag vzlogger
$InputFilePollInterval 10
$InputFileSeverity info
$InputFileFacility local3
$InputRunFileMonitor
# my local Wavefront proxy
local3.* @@192.168.0.137:5045

$InputFileName /var/log/vzlogger/vzlogger.log
$InputFileTag vzlogger
$InputFilePollInterval 10
$InputFileSeverity info
$InputFileFacility local3
$InputRunFileMonitor
# my vRealize Log Insight instance on VMC
local3.* @@xxx.xxx.xxx.xxx:yyyy

Results

Wavefront UI displays the collected real-time data and offers a wide range of functions and transformation for data analytics.

Figure 06: Real-time data in Wavefront UI

As my proxy is running as a VM on my laptop I will need to move it to something more reliable:-) Once I have collected more data points and added another metrics provided by vzlogger I am going to play around with the WQL – the Wavefront Query Language.

Outlook

Next challange is to collect the data using vRealize Operations.

Stay safe.

Thomas – https://twitter.com/ThomasKopton

Energy Consumption Monitoring using SML Data and vRealize Log Insight

It all started with my last electricity bill. Shortly after I have recovered from the shock and made sure that I really do not have any aluminum smelter running in my basement, I decided, I need some kind of monitoring of my electric energy consumption.

Insights into data is the first and probably most important step in the process of taking measures to change any given situation.

My plan was to consume the available Smart Messaging Language (SML) data and following that post (sorry, it is in German only) create a real-time dashboard:

http://raspberry.tips/raspberrypi-tutorials/smartmeter-stromzaehler-mit-dem-raspberry-pi-auslesen-und-aufzeichnen

No sooner said than done and only after few steps I have realized that it will be slightly more work to let the included webserver run on the same Raspberry PI which is already running my PI-hole??.

OK, invest time to change the scripts and build the dashboard into the PI-hole web server or use another Raspberry and start over?

As I already have the data available in the vzlogger.log file why shouldn’t I use my vRealize Loginsight to display it? Sure, semantically I am dealing with time-series data here and another vRealize product would be more suitable but I wanted something quick and easy – no worries, vRealize Operations and/or Wavefront integration is already on my to-do list.

The Data

vzlogger (https://github.com/volkszaehler/vzlogger) reads the SML telegrams coming from the smart meter via serial port and stores the readings every two seconds in a log file. I have configured the log file to be: /var/log/vzlogger/vzlogger.log

[Jan 30 19:21:46][mtr0] Got 5 new readings from meter:
[Jan 30 19:21:46][mtr0] Reading: id=1-0:1.8.0255/ObisIdentifier:1-0:1.8.0255 value=38768802.85 ts=1612030906436
[Jan 30 19:21:46][mtr0] Reading: id=1-0:1.8.1255/ObisIdentifier:1-0:1.8.1255 value=15102120.00 ts=1612030906436
[Jan 30 19:21:46][mtr0] Reading: id=1-0:1.8.2255/ObisIdentifier:1-0:1.8.2255 value=23666680.00 ts=1612030906436
[Jan 30 19:21:46][mtr0] Reading: id=1-0:1.7.0255/ObisIdentifier:1-0:1.7.0255 value=386.51 ts=1612030906436
[Jan 30 19:21:46][mtr0] Reading: id=1-0:96.5.5255/ObisIdentifier:1-0:96.5.5255 value=6560.00 ts=1612030906436

Syslog Configuration

As I do not have a vRLI Agent for the ARM platform I decided to use rsyslog to send the log file entries to vRealize Log Insight.

The configuration follows the rsyslog procedures. I have simply created an appropriate file in /etc/rsyslog.d

vzlogger.conf

$ModLoad imfile
$InputFileName /var/log/vzlogger/vzlogger.log
$InputFileTag vzlogger
# $InputFileStateFile
$InputFilePollInterval 10
$InputFileSeverity info
$InputFileFacility local3
$InputRunFileMonitor
local3.* @@xxx.xxx.xxx.xxx

xxx.xxx.xxx.xxx in @@xxx.xxx.xxx.xxx is of course the IP of my vRealize Log Insight instance.

vRealize Log Insight Configuration

After restarting the rsyslog daemon the SML decoded messages start arriving in vRLI every 10 seconds as configured in vzlogger.conf.

Figure 01: vzlogger messages in vRealize Log Insight

vRealize Log Insight Extracted Fields are a good way to extract the actual metrics defined by the Object Identification System (OBIS):

  • ObisIdentifier:1-0:1.7.0 = key 1.7.0 = Current consuption in (WATT)
  • ObisIdentifier:1-0:1.8.1 = key 1.8.1 = Accumulated consumption – rate 1 (meter reading)
  • ObisIdentifier:1-0:1.8.2 = key 1.8.2 = Accumulated consumption – rate 2 (meter reading)

To make the extraction of the value efficient, I am using the hostname of the data source as additional context. Additional context ensures that vRLI does not have to parse every single log message arriving in vRLI. Only messages coming from my Raspberry PI will be parsed.

Figure 02: Extracted Field used to assign the actual metric, consumption, to a searchable field

Additionally I extract the OBIS key to use both, key and value to create my dashboards.

Figure 03: OBIS key as Extracted Field

Now it is easy to show the actual consumption using both extracted fields.

Figure 04: Current consumption as max and average

vRealize Log Insight Dashboard (Sharing)

Information made visible in the vRealize Log Insight Interactive Analytics can be added to a dashboard and shared with other users.

Figure 05: Creating a shared dashboard

The dashboard can have multiple widgets and give you a quick insight into the collected data.

Figure 06: Electric energy consumption dashboard

vRealize Log Insight Partition

Another fairly new (since 8.1) feature of vRealize Log Insight I have used in this example is Data Partitions.

You can retain log data in a partition with a filter and a retention period. Data partitions let you define different retention periods for different types of logs. For example, logs with sensitive information might require a short retention period, such as five days. The log data that matches the filter criteria for a data partition is stored in the partition for the specified retention period. Logs that do not match the filter criteria in any of the defined data partitions are stored in the default partition. This partition is always enabled and stores data for an unlimited amount of time. You can modify the retention period for the default partition.

I have created a partition for the energy data to retain that data for 90 days. My host bajor is sending only the vzlogger messages so I can use the hostname as filter.

Figure 07: Data partition for the energy consumption data

Outlook

As the next project, I am planning to send the metrics to vRealize Operations or Wavefront to treat them as time-series data and allow for more sophisticated analytics.

Stay safe.

Thomas – https://twitter.com/ThomasKopton

Monitoring vSphere HA and Admission Control Settings using vRealize Operations

vSphere High Availability (vSphere HA) and Admission Control ensure that sufficient resources are reserved for virtual machine recovery when a host fails. Usually, my customers are running their vSphere clusters in either N+1 or N*2 configurations reflected corresponding Admission Control settings

In one of my previous blog posts, I have described how vRealize Operations helps with capacity management for N+1 and N*2 configured clusters.

In this post, I will describe how vRealize Operations helps to monitor the vSphere infrastructure to find any deviations from the desired HA and Admission Control state.

The dashboard and all needed components can be, as always, found on code.vmware.com:

https://code.vmware.com/samples?id=7508

Motivation

Even if you are responsible for a very small environment, just few ESXi clusters, do you have a complete, reliable and current overview of the HA and Admission Control settings on every cluster? Do you know any possible deviations from your desired state?

A few simple vROps Super Metrics, Views, and one Dashboard can help you maintain exactly the state of your vSphere Environment that will ensure sufficient resources for virtual machine recovery when a host or multiple hosts fail(s).

How the Dashboard works

The dashboard will help you answer a few simple questions:

  • Is HA enabled on my ESXi clusters?
  • What Admission Control Policy is configured?
  • What is the current amount (in %) of reserved CPU and memory resources on every single cluster?
  • Does the current amount (in %) of reserved CPU and memory resources configured through Admission Controlequal the desired amount as intended by the selected capacity model for the cluster, N+1 or N*2?

The base indicator to differentiate between different models is a vSphere tag. To make the vROps views work right after importing them, correct tags need to be assigned to the clusters.

Figure 1: vSphere Tags

These tags are used as filters in the N+1 and N*2 centric views.

Figure 2: Filter for N+1 centric View

For N+1 clusters we need to calculate the desired value for reserved CPU and memory resources and compare that value with the current value calculated by vSphere. To take any ESXi hosts in maintenance I have also added additional information regarding the count of ESXi hosts in maintenance and the count of hosts contributing to the current pool of compute resources.

Figure 3: vROps Super Metrics

To make this dashboard work in your environment you need to set the vSphere tags appropriately. Of course, you can use your own tags and adjust the filters in the views accordingly.

Do not forget to enable the imported Super Metrics in your policies.

Stay safe.

Thomas – https://twitter.com/ThomasKopton

Custom Compliance Management using vRealize Operations

As you probably know vRealize Operations provides several Compliance Packs basically out-of-the-box (“natively”). A simple click on “ACTIVATE” in the “Repository” tab installs all needed components of the Compliance Pack and allows the corresponding regulatory benchmarks to be executed.

Regulatory benchmarks provide solutions for industry standard regulatory compliance requirements to enforce and report on the compliance of your vSphere objects. You can install compliance packs for the following regulatory standards.

In the following picture, you can see the currently available six Compliance Packs.

Figure 1: Native Compliance Packs

But what if regarding compliance you have different requirements than what is provided by the available Packs? What are the components and the method to create customized or completely new compliance benchmarks?

In this blog post, I will give you a short overview of what vRealize Operations elements comprise a Compliance Pack and how to put everything together to create your very own custom compliance benchmark.

Components of a Compliance Management Pack

The mandatory parts of a Compliance Pack which are implementing the actual checks are:

  • symptom definitions
  • alert definitions
  • policy (activates the needed metrics, properties, symptom, and alert definitions)

In addition to these components, the available Compliance Packs provide a report template that consists of views as well as recommendations which are part of the alert definitions. The following picture shows the Compliance Pack for CIS as an example.

Figure 2: Content of a Compliance Pack

The Method

The general workflow from the certain requirement, “what to check”, to the Compliance Pack is always the same. The following diagram shows the single steps. As you see, you are not limited to metrics and properties vRealize Operations provide through the various Management Packs, you can add your own custom metrics and symptoms and make them part of your custom benchmark.

Figure 3: Compliance Pack workflow

In general this is what you need to do:

  1. Find the appropriate metric or property to check a certain aspect of your custom compliance
  2. Create a symptom definition containing that metric or property
  3. Create one or multiple alert definitions (e.g. one per vROps object type) and include all previously created symptom definitions as “ANY” set of definitions
  4. Create or adjust a vROps policy to enable all needed metrics and properties (if disabled)

As always, you may review the native Compliance Packs to see some examples. In the following picture, you can see the alert definitions for different object types as defined in the Compliance Pack for CIS.

Figure 4: Alert definitions in the Compliance Pack for CIS

NOTE: It is required to set the “Alert Subtype” to “Compliance” to allow the alert definition to be part of a custom compliance benchmark.

The alert definition consists of all relevant symptom definitions for the certain object type, as shown in the next picture.

Figure 5: Alert definition example

Final Step – Custom Compliance

The last and easiest step is to add the alert definitions to the new Custom Compliance and enable the alert definitions in a vROps policy.

Figure 6: Create a new custom benchmark
Figure 7: Add alert definitions
Figure 8: Select the policy

Finally vRealize Operations will check the compliance of your environment and present the results in the compliance widget.

Figure 9: Results of a compliance check

Now, let’s go and create your own customized compliance benchmark.

Stay safe.

Thomas – https://twitter.com/ThomasKopton

Capacity Management for n+1 and n*2 Clusters using vRealize Operations

When it comes to capacity management in vSphere environments using vRealize Operations customers are frequently asking for guidelines how to setup vROps to properly manage n+1 and n*2 ESXi clusters.

Just as a short reminder, n+1 in context of a ESXi cluster means that we are tolerating (and are hopefully prepared for) the failure of exactly one host. If we need to cope with the failure of 50% of all hosts in a cluster, like two fault domains, we often use the n*2 term.

In general we have two options to make vRealize Operations aware of the failure strategy for the ESXi clusters:

  • the “out-of-the-box” and very easy approach using vSphere HA and Admission Control
  • the vROps, and almost same easy, way using vRealize Operations Policies

vSphere HA and Admission Control

If configured Admission Control automatically calculates the reserved CPU and Memory failover capacity. In the first example my cluster is configured to tolerate failure of one host, which makes it 25% for my 4-hosts cluster.

Figure 1: vSphere and HA settings – n+1 cluster

vRealize Operations is collecting this information and accordingly calculating the remaining capacity. In the following picture you can see vROps recognizing the configured HA buffer of 25%.

Figure 2: vROps HA buffer for n+1 cluster

If we now change the Admission Control settings to n*2, in my case two ESXi host, vSphere is calculating the new required CPU and Memory buffer. We could also set the buffer manually in to 50% or whatever value is required.

Figure 3: vSphere and HA settings – n*1 cluster

After a collection cycle, vRealize Operations retrieves the new settings and starts calculating capacity related metrics using the adjusted values for available CPU and Memory capacity.

Figure 4: vROps HA – available capacity reflecting new HA settings

The “Capacity Remaining” decreases following the new available capacity and the widget shows the new buffer values in %.

Figure 5: vROps HA buffer for n*1 cluster

vRealize Operations Capacity Buffer and Policies

Sometimes the vSphere HA Admission Control is not being used and customers need another solution for their capacity management requirements.

This is where vROps Policies and Capacity Buffer settings helps manage vSphere resources.

vRealize Operations applies various settings to groups of object using vROps Policies. One section of a policy are Capacity Settings.

Figure 6: vROps Capacity Settings via Policy

Within the Capacity Settings you can define a buffer for CPU, Memory and Disk Space to reduce the available capacity of a vSphere cluster or a group of clusters. You can set the values for both capacity models, Demand and Allocation, separately.

Figure 7: vROps Capacity Settings – Buffer

In my example, I have disabled Admission Control in vCenter and set buffers in vROps.

Figure 8: vROps capacity remaining using buffer setting via policy

vRealize Operations is now using the new values for available resources to calculate cluster capacity metrics.

Btw. Custom Groups are the vROps way to group similar cluster together and treat all of them the same way.

Stay safe.

Thomas – https://twitter.com/ThomasKopton